firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
Refs #333
We currently support training continuation for back translations, and teacher training.
``` Dataset Code Sentences Size URL ────────────────────────────── ────────────────────────────────────── ───────── ──────── ────────────────────────────────────────────────────────── ELRC-Museus_2007 opus_ELRC-Museus_2007/v1 125 7.2 kB https://opus.nlpl.eu/ELRC-Museus_2007-v1.php ELRC-Localidades_2007 opus_ELRC-Localidades_2007/v1 101 8.2 kB https://opus.nlpl.eu/ELRC-Localidades_2007-v1.php ELRC-2638-monumentos_2007 opus_ELRC-2638-monumentos_2007/v1 17 8.2 kB https://opus.nlpl.eu/ELRC-2638-monumentos_2007-v1.php ELRC-2614-Localidades_2007...
Issues with the current implementation: - We use naive tokenization because it's what OpusTrainer requires. This might produce alignments of lower quality because we don't take into account punctuation and...
We have tests that require external binaries like Marian and others. These are all available in our local Docker images. However this image is quite slow on non-x86 systems. In...
The first task was terminated after 3 days: https://firefox-ci-tc.services.mozilla.com/tasks/Drz5ugggQAi56SUt6HHLPw/runs/6/logs/public/logs/live.log This one is still in progress https://firefox-ci-tc.services.mozilla.com/tasks/Drz5ugggQAi56SUt6HHLPw/runs/7/logs/live/public/logs/live.log This has something to do with the recent refactoring of the mono downloader.
There are some specific meta bugs around of issues around [pipeline usability](https://github.com/mozilla/firefox-translations-training/issues/311), [general translation quality](https://github.com/mozilla/firefox-translations-training/issues/216), and [robustness](https://github.com/mozilla/firefox-translations-training/issues/238), but this meta bug is specifically about blockers around scaling to training and...
Example: https://github.com/mozilla/firefox-translations/issues/716. "QUELS COOKIES ET QUELS TRACEURS ?" is translated as "WHAT COOKIES AND THAT THAT THAT THAT COOKING AND THAT COOKIES AND TH ". "Quels cookies et quels traceurs...