firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
https://firefox-ci-tc.services.mozilla.com/tasks/AsVG4ziaTMKjYq6Z9fhgwg/runs/0/logs/public/logs/live.log https://firefox-ci-tc.services.mozilla.com/tasks/eScwZPfjS_yHCm6Gf4ufng/runs/0/logs/public/logs/live.log ``` [task 2024-06-17T06:16:05.464Z] [2024-06-17 06:16:05] [config] workspace: 12000 [task 2024-06-17T06:16:05.464Z] [2024-06-17 06:16:05] [config] Loaded model has been created with Marian v1.12.14 2d067af 2024-02-16 11:44:13 -0500 [task 2024-06-17T06:16:05.466Z] [2024-06-17...
Here is a basic distribution of newscrawl data (2019), being mixed Latin and Cyrillic. Serbian is digraphic, which means it is fine to use either. Generally when I looked at...
I ran into an issue with Turkish: https://firefox-ci-tc.services.mozilla.com/tasks/Ip5AUlOmRU2hu2yP8RfS0w/runs/0/logs/public/logs/live.log Specifically `alpha_ratio` filter: https://github.com/hplt-project/OpusCleaner/blob/main/opuscleaner/filters/clean_common.py
It seems we're having [issues](https://github.com/mozilla/firefox-translations-training/issues/669#issuecomment-2163731262) with the merging code for the second time: https://github.com/mozilla/firefox-translations-training/blob/fd2f7da7a47eaeb9dde92a47f250afb000edb465/pipeline/translate/collect.sh#L41 This performs differently on different machines. I attached to the collect task and see this behaviour:...
https://firefox-ci-tc.services.mozilla.com/tasks/AS3nhpOxTaqAGonVw77qMQ/runs/0/logs/public/logs/live.log Likely after #664
I haven't looked into this too deeply, but we are failing with OOM when computing alignments with eflomal. https://firefox-ci-tc.services.mozilla.com/tasks/WoiZo-oDQAuRuN_yTu2EKw Perhaps there is a more efficient way to do this, or...
I was debugging alignment failures, and noticed that file handles are only closed at the end of the script running. It would be better to close things after each step...
We should start thinking of how we're going to monitor many training runs at the same time. We can try enabling slack notifications: https://github.com/mozilla-mobile/firefox-android/blob/db1d3d3477ec8c3e485cfc7f92b60084b49f0622/taskcluster/ci/release-notify-testrail/kind.yml#L25-L31 I don't know if we want...
In PR #620 I introduced automatic config generation, but the generated config generation still need some work until they are production ready. This issue tracks making them ready. ```[tasklist] ###...