firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
Sometimes URLs are written in text rather than hidden behind the HTML element. The URL should be copied as is in this case. There are two ways to fix this:...
I'm working on some performance optimization to bring down our CI times, and I found some tasks are taking 12 minutes to resolve due to issues downloading fetches and artifacts....
We had an incident where the uk-en model broke in Nightly. We should automate tests before release so that it catches issues like. We should run each release channel of...
Do our alignment procedures work correctly for CJK? Check both: `align.py` and `dataset_importer.py` where we use `sim_align` lib.
It costs money to store models in the cloud. We could save a bit, and make the output of the train tasks a bit less confusing if we just stored...
We recently got a [report](https://github.com/mozilla/firefox-translations-training/issues/816) about made up words in Turkish. I also tested the new ru-en model and noticed a lot of non-existent words there as well. For example:...
See paper: [Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach](https://arxiv.org/abs/2405.15613). This can be helpful for example for monolingual data where we have a lot of it ( all en-xx...
I'm looking a bit into CI speeds, and I wanted to try to slim down the model even more.