firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Training pipelines for Firefox Translations neural machine translation models

Results 311 firefox-translations-training issues
Sort by recently updated
recently updated
newest added

https://github.com/mozilla-l10n/mt-training-data Maybe we could add it to OPUS.

data sources

Does it require any adjustment? Should we change any hyperparameters etc.?

language-coverage

https://hplt-project.org/datasets/v2.0

data sources

I'm talking about the set of tools by @gregtatum: https://gregtatum.github.io/taskcluster-tools/ At least the training dashboard is translations specific and ideally should live in this repo following the monorepo idea. It...

We will need to retrain models that don't have the robustness fixes from using OpusTrainer. - [ ] bg-en - [ ] de-en - [ ] en-bg - [ ]...

quality

Short sentences are frequently removed from parallel datasets, so there aren't enough to train on. In HPLT 2.0 the data is filtered at the document level, rather than sentence level....

quality
data sources

It's not guaranteed that parallel "sentences" are actually sentences. We could write a script to detect how many sentences are in each parallel datum, and then attempt to extract them...

quality
data sources

We already statistically generate word alignment information, it should be possible to go through parallel datasets, and generate word pairs of the most common words that are aligned. Since the...

quality
data sources

Try translating https://www.omniglot.com/language/idioms/swedish.htm to Swedish. The English idioms are translated literally. While that may be useful *for this particular page* (if one doesn't know English), for content in general when...

feedback