firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Training pipelines for Firefox Translations neural machine translation models

Results 311 firefox-translations-training issues
Sort by recently updated
recently updated
newest added

https://firefox-ci-tc.services.mozilla.com/tasks/AsVG4ziaTMKjYq6Z9fhgwg/runs/0/logs/public/logs/live.log https://firefox-ci-tc.services.mozilla.com/tasks/eScwZPfjS_yHCm6Gf4ufng/runs/0/logs/public/logs/live.log ``` [task 2024-06-17T06:16:05.464Z] [2024-06-17 06:16:05] [config] workspace: 12000 [task 2024-06-17T06:16:05.464Z] [2024-06-17 06:16:05] [config] Loaded model has been created with Marian v1.12.14 2d067af 2024-02-16 11:44:13 -0500 [task 2024-06-17T06:16:05.466Z] [2024-06-17...

Here is a basic distribution of newscrawl data (2019), being mixed Latin and Cyrillic. Serbian is digraphic, which means it is fine to use either. Generally when I looked at...

language-coverage

I ran into an issue with Turkish: https://firefox-ci-tc.services.mozilla.com/tasks/Ip5AUlOmRU2hu2yP8RfS0w/runs/0/logs/public/logs/live.log Specifically `alpha_ratio` filter: https://github.com/hplt-project/OpusCleaner/blob/main/opuscleaner/filters/clean_common.py

language-coverage

It seems we're having [issues](https://github.com/mozilla/firefox-translations-training/issues/669#issuecomment-2163731262) with the merging code for the second time: https://github.com/mozilla/firefox-translations-training/blob/fd2f7da7a47eaeb9dde92a47f250afb000edb465/pipeline/translate/collect.sh#L41 This performs differently on different machines. I attached to the collect task and see this behaviour:...

enhancement

https://firefox-ci-tc.services.mozilla.com/tasks/AS3nhpOxTaqAGonVw77qMQ/runs/0/logs/public/logs/live.log Likely after #664

I haven't looked into this too deeply, but we are failing with OOM when computing alignments with eflomal. https://firefox-ci-tc.services.mozilla.com/tasks/WoiZo-oDQAuRuN_yTu2EKw Perhaps there is a more efficient way to do this, or...

cost & perf
high resource

I was debugging alignment failures, and noticed that file handles are only closed at the end of the script running. It would be better to close things after each step...

We should start thinking of how we're going to monitor many training runs at the same time. We can try enabling slack notifications: https://github.com/mozilla-mobile/firefox-android/blob/db1d3d3477ec8c3e485cfc7f92b60084b49f0622/taskcluster/ci/release-notify-testrail/kind.yml#L25-L31 I don't know if we want...

taskcluster

In PR #620 I introduced automatic config generation, but the generated config generation still need some work until they are production ready. This issue tracks making them ready. ```[tasklist] ###...

epic