firefox-translations-training issues

Speed up CI by changing model params

1

Improve translation of URLs

4

Sometimes URLs are written in text rather than hidden behind the HTML element. The URL should be copied as is in this case. There are two ways to fix this:...

eu9ene

quality

CI run is slowed down by failing to download fetches

1

I'm working on some performance optimization to bring down our CI times, and I found some tasks are taking 12 minutes to resolve due to issues downloading fetches and artifacts....

gregtatum

bug

taskcluster

cost & perf

Add a pre-release test for models

We had an incident where the uk-en model broke in Nightly. We should automate tests before release so that it catches issues like. We should run each release channel of...

gregtatum

model release

Check alignments for CJK

1

Do our alignment procedures work correctly for CJK? Check both: `align.py` and `dataset_importer.py` where we use `sim_align` lib.

eu9ene

language-coverage

Only retain the best metric model, and delete the others

2

It costs money to store models in the cloud. We could save a bit, and make the output of the train tasks a bit less confusing if we just stored...

gregtatum

cost & perf

We recently got a [report](https://github.com/mozilla/firefox-translations-training/issues/816) about made up words in Turkish. I also tested the new ru-en model and noticed a lot of non-existent words there as well. For example:...

eu9ene

quality

New CI tests with opustrainer and dataset truncation changes

gregtatum

Consider rebalancing datasets with clustering

See paper: [Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach](https://arxiv.org/abs/2405.15613). This can be helpful for example for monolingual data where we have a lot of it ( all en-xx...

eu9ene

quality

Make the CI model training even slimmer

I'm looking a bit into CI speeds, and I wanted to try to slim down the model even more.

gregtatum

firefox-translations-training
firefox-translations-training copied to clipboard

Metadata

Speed up CI by changing model params

Improve translation of URLs

CI run is slowed down by failing to download fetches

Add a pre-release test for models

Check alignments for CJK

Only retain the best metric model, and delete the others

Made-up words in translations

New CI tests with opustrainer and dataset truncation changes

Consider rebalancing datasets with clustering

Make the CI model training even slimmer

← Metadata

Owner

Metadata

firefox-translations-training firefox-translations-training copied to clipboard

Metadata

← Metadata

Owner

Metadata

firefox-translations-training
firefox-translations-training copied to clipboard