firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
https://github.com/mozilla-l10n/mt-training-data Maybe we could add it to OPUS.
Does it require any adjustment? Should we change any hyperparameters etc.?
I'm talking about the set of tools by @gregtatum: https://gregtatum.github.io/taskcluster-tools/ At least the training dashboard is translations specific and ideally should live in this repo following the monorepo idea. It...
We will need to retrain models that don't have the robustness fixes from using OpusTrainer. - [ ] bg-en - [ ] de-en - [ ] en-bg - [ ]...
Short sentences are frequently removed from parallel datasets, so there aren't enough to train on. In HPLT 2.0 the data is filtered at the document level, rather than sentence level....
It's not guaranteed that parallel "sentences" are actually sentences. We could write a script to detect how many sentences are in each parallel datum, and then attempt to extract them...
We already statistically generate word alignment information, it should be possible to go through parallel datasets, and generate word pairs of the most common words that are aligned. Since the...
Try translating https://www.omniglot.com/language/idioms/swedish.htm to Swedish. The English idioms are translated literally. While that may be useful *for this particular page* (if one doesn't know English), for content in general when...