firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Training pipelines for Firefox Translations neural machine translation models

Results 311 firefox-translations-training issues
Sort by recently updated
recently updated
newest added

I have no idea how to fix this, any help or at least guidance is appreciated. And here is my current log for a new job. It seems to be...

snakemake

Now when it properly runs on GPUs CPU utilizaiton is ~10%. We have 40 vCPUs now. We can experiment with it and maybe reduce to 8 - 16.

cost & perf

If practical, the LLMs might be useful for a variety of tasks: - Quality evaluation - Data augmentation (including back translation for low-resource languages) - Using as a teacher model...

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.15 to 1.26.18. Release notes Sourced from urllib3's releases. 1.26.18 Made body stripped from HTTP requests changing the request method to GET after HTTP 303 "See Other"...

dependencies

Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.2.3 to 2.3.8. Release notes Sourced from werkzeug's releases. 2.3.8 This is a security release for the 2.3.x feature branch. Changes: https://werkzeug.palletsprojects.com/en/2.3.x/changes/#version-2-3-8 2.3.7 This is a fix...

dependencies

It would be easier to maintain the docker images for all tasks in this repo compared to updating the generic worker image elsewhere every time we need to add something....

enhancement
taskcluster

I haven't fully audited the code, but I suspect that the monolingual data is not being deduplicated from the parallel data. For instance, in the `ca-en` model, OpenSubtitles was used...

quality

People keep asking how to help add another language. 1. The first good step would be helping to research datasets. To estimate feasibility of training we need statistics on how...

documentation
community

I replaced it with the small one after this bug https://github.com/hplt-project/OpusCleaner/issues/122. We should revert it and see whether it's fixed.

quality

We are still using 2.0 (https://github.com/mozilla/firefox-translations-training/blob/71013bcea0e4647d04d508daf45fe2a96c27ef0d/pipeline/bicleaner/requirements/bicleaner-ai.in), but the latest version is 2.3.2. 2.2.0 adds support for tokenizing by characters (for Chinese).

language-coverage