tokenization:
mi% 20piaci% 20jucari% 20cu% 20la% 20gatta .
subwords:
mi@@ % 20@@ piaci@@ % 20@@ juc@@ ari@@ % 20@@ cu@@ % 20@@ la@@ % 20@@ g@@ atta .
urtimu agg.: 2023.05.20
Back here, "behind the curtain," you can see how the Sicilian Translator works.
First, it tokenizes the input sentence to a reduced form. Then subword splitting breaks the words into shorter units, which are then passed to the translator.
The translator returns the Top 5 translations of the input sentence, which this page displays in detokenized form along with the translation score. Like golf, a lower score is a better score.
For more information, please see the documentation and the Sicilian NLP pages.