firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Training pipelines for Firefox Translations neural machine translation models

Results 311 firefox-translations-training issues
Sort by recently updated
recently updated
newest added

We're using version 2.0 and missing newer datasets (wmt21, wmt22 etc.). It's 2.4 in requirements for the eval step. https://github.com/mjpost/sacrebleu/releases

dataset: `sacrebleu_aug-mix_mtedx/valid` It worked fine for `flores_aug-mix_dev` and `mtdata_aug-mix_Neulab-tedtalks_dev-1-eng-ell` https://firefox-ci-tc.services.mozilla.com/tasks/KmZrLAdcQtGnQd8ZdgxDhQ/runs/0 ``` [task 2024-09-03T23:26:56.604Z] tokenizer.json: 0%| | 0.00/1.96M [00:00

bug

It takes 15 minutes to run one test, I think due to the install for the virtual environment that contains cuda and pytorch dependencies. tests/test_data_importer.py::test_basic_corpus_import[mtdata-Neulab-tedtalks_test-1-eng-rus] PASSED [ 29%] https://share.firefox.dev/3ZclPBD

cost & perf

Is it possible the taskcluster fetches got corrupted? https://firefox-ci-tc.services.mozilla.com/tasks/FC2YNEIiS0mPBWnLTpDQEw/runs/0/logs/public/logs/live.log ``` [task 2024-09-04T10:32:25.194Z] Traceback (most recent call last): [task 2024-09-04T10:32:25.194Z] File "/home/ubuntu/.local/bin/opustrainer-train", line 8, in [task 2024-09-04T10:32:25.194Z] sys.exit(main()) [task 2024-09-04T10:32:25.194Z] File...

bug

* [Profile of the CI tasks](https://share.firefox.dev/47gHESv) * [evaluate-backward-flores-devtest-ru-en](https://share.firefox.dev/4gfdgvW) | task | runtime | | ---- | ------- | | evaluate-backward-flores-devtest-ru-en | 16m | | evaluate-teacher-ensemble-flores-devtest-ru-en | 18m | nvidia cudnn...

cost & perf

For example: ``` gsutil ls gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student ``` shows ``` gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.metrics gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.metrics gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.metrics gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.en gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.en.ref gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.lt gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.metrics ``` For example `aug-mix_wmt19.metrics` should...

bug

I've been playing around with en-tr translations and I'd like to share some feedback. I chose [this story](https://learnenglish.britishcouncil.org/general-english/story-zone/a2-b1-stories/devils-details-a2/b1) for a detailed comparison with Google Translate. In the Google docs linked...

feedback

https://github.com/mozilla/firefox-translations-training/actions/runs/10688710629/job/29629195406 ``` Error: Ensure GITHUB_TOKEN has permission "id-token: write". ``` I'm not sure what happened to the token.

bug
documentation

It would be interesting to compare evaluation capabilities of LLMs to COMET and human evaluation. See the paper: [Large Language Models Are State-of-the-Art Evaluators of Translation Quality](https://arxiv.org/pdf/2302.14520)

LLM

For example https://huggingface.co/datasets/ontocord/CulturaY.

data sources