firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
Similar to #402 In multiple places (e.g. https://github.com/mozilla/firefox-translations-training/blob/main/Snakefile#L549) there is an explicit file path to a `.gz` extension. While I use `.gz`, the `split` command supports other extensions as well....
It takes some time to spin up a multi GPU machines and then to download the artifacts and install the dependencies. It just exists if the threshold is 0. We...
We should run the whole pipeline and analyze GCP dashboards. The first candidate for optimization is GPU machines for training. We don't shuffle the training dataset in memory anymore, so...
For bicleaner, you use https://github.com/mozilla/firefox-translations-training/blob/main/pipeline/bicleaner/download_pack.py#L93 ```py def main(args: Optional[list[str]] = None) -> None: ``` which is not allowed in this environment, defined [here](https://github.com/mozilla/firefox-translations-training/blob/main/envs/bicleaner.yml) to use python 3.7 > Traceback (most...
When inspecting the running pipeline I have to download artifacts on the local machine quite often. I essentially do: `wget http://artifact.zst` `zstd -dc artifact.zst | head -n 100` or similar....
We recently upgraded [worker-runner](https://github.com/taskcluster/taskcluster/tree/main/tools/worker-runner) on the GPU workers to a version that is supposed to gracefully handle spot preemptions. Most notably, it should be uploading artifacts before an instance terminates....
fix #338
We have a great bar chart to compare across the models for one experiment (runs group). We should figure out how to tune this dashboard or rename the steps so...
Apparently we already use some monolingual data from there as a custom corpus based on @gregtatum's investigation. Also we have a tool to list the available data https://github.com/mozilla/firefox-translations-training/pull/397