firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Support training separate source/target SentencePiece Models

Open radinplaid opened this issue 2 years ago • 1 comments

It appears that the pipeline only supports training a joint BPE model, but it is sometimes better to have separate source/target BPE vocabularies

radinplaid avatar Jul 15 '22 14:07 radinplaid

I would really like to see that too. I work on language pairs with no overlap between the src and target character set, and so a separate tokenization model for each makes sense.

AmitMY avatar Jul 27 '22 14:07 AmitMY