firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
We're using version 2.0 and missing newer datasets (wmt21, wmt22 etc.). It's 2.4 in requirements for the eval step. https://github.com/mjpost/sacrebleu/releases
dataset: `sacrebleu_aug-mix_mtedx/valid` It worked fine for `flores_aug-mix_dev` and `mtdata_aug-mix_Neulab-tedtalks_dev-1-eng-ell` https://firefox-ci-tc.services.mozilla.com/tasks/KmZrLAdcQtGnQd8ZdgxDhQ/runs/0 ``` [task 2024-09-03T23:26:56.604Z] tokenizer.json: 0%| | 0.00/1.96M [00:00
It takes 15 minutes to run one test, I think due to the install for the virtual environment that contains cuda and pytorch dependencies. tests/test_data_importer.py::test_basic_corpus_import[mtdata-Neulab-tedtalks_test-1-eng-rus] PASSED [ 29%] https://share.firefox.dev/3ZclPBD
Is it possible the taskcluster fetches got corrupted? https://firefox-ci-tc.services.mozilla.com/tasks/FC2YNEIiS0mPBWnLTpDQEw/runs/0/logs/public/logs/live.log ``` [task 2024-09-04T10:32:25.194Z] Traceback (most recent call last): [task 2024-09-04T10:32:25.194Z] File "/home/ubuntu/.local/bin/opustrainer-train", line 8, in [task 2024-09-04T10:32:25.194Z] sys.exit(main()) [task 2024-09-04T10:32:25.194Z] File...
* [Profile of the CI tasks](https://share.firefox.dev/47gHESv) * [evaluate-backward-flores-devtest-ru-en](https://share.firefox.dev/4gfdgvW) | task | runtime | | ---- | ------- | | evaluate-backward-flores-devtest-ru-en | 16m | | evaluate-teacher-ensemble-flores-devtest-ru-en | 18m | nvidia cudnn...
For example: ``` gsutil ls gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student ``` shows ``` gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.metrics gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.metrics gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.metrics gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.metrics ``` For example `aug-mix_wmt19.metrics` should...
I've been playing around with en-tr translations and I'd like to share some feedback. I chose [this story](https://learnenglish.britishcouncil.org/general-english/story-zone/a2-b1-stories/devils-details-a2/b1) for a detailed comparison with Google Translate. In the Google docs linked...
https://github.com/mozilla/firefox-translations-training/actions/runs/10688710629/job/29629195406 ``` Error: Ensure GITHUB_TOKEN has permission "id-token: write". ``` I'm not sure what happened to the token.
It would be interesting to compare evaluation capabilities of LLMs to COMET and human evaluation. See the paper: [Large Language Models Are State-of-the-Art Evaluators of Translation Quality](https://arxiv.org/pdf/2302.14520)
For example https://huggingface.co/datasets/ontocord/CulturaY.