firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
chrF is now considered more reliable than BLEU, and should work better for CJK Based on advice from #748 + unify sacrebleu and mtdata versions everywhere closes #748
Does it require any adjustment? Do our metrics (chrF, COMET, BLEU) work correctly for these languages?
Use custom OpusCleaner configs with disabled word-based filters. The filters are copied from https://github.com/hplt-project/HPLT-MT-Models/blob/main/v1.0/data/en-zh_hant/raw/v2/HPLT-v1.1.en-zh_hant.filters.json. I don't think it's feasible to do the src-trg-ratio that requires tokenization now. We would have...
- character coverage - size closes #745
See comments from Jaume: https://github.com/mozilla/firefox-translations-training/issues/45#issuecomment-1036191497 https://github.com/mozilla/firefox-translations-training/issues/45#issuecomment-1036198055
[skip ci]
Does decoding, extract-best and other procedures for translation work the same way for CJK?
I don't have a good understanding of why some lines are suddenly empty as a result of running "extract_lex". There are just a few of them and the model trained...
@gregtatum The goal of this patch is to move much of the functionality from the [build-bergamot.py](https://searchfox.org/mozilla-central/rev/dca2603d55b5b39d3b8ab8e93c08b42563f5aad8/toolkit/components/translations/bergamot-translator/build-bergamot.py) script in Mozilla Central upstream into this repository to better streamline how WASM artifacts...