firefox-translations-training
firefox-translations-training copied to clipboard
Support training separate source/target SentencePiece Models
It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies
I would really like to see that too. I work on language pairs with no overlap between the src and target character set, and so a separate tokenization model for each makes sense.