firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Training pipelines for Firefox Translations neural machine translation models

Results 311 firefox-translations-training issues
Sort by recently updated
recently updated
newest added

Similar to #402 In multiple places (e.g. https://github.com/mozilla/firefox-translations-training/blob/main/Snakefile#L549) there is an explicit file path to a `.gz` extension. While I use `.gz`, the `split` command supports other extensions as well....

snakemake

It takes some time to spin up a multi GPU machines and then to download the artifacts and install the dependencies. It just exists if the threshold is 0. We...

cost & perf

We should run the whole pipeline and analyze GCP dashboards. The first candidate for optimization is GPU machines for training. We don't shuffle the training dataset in memory anymore, so...

cost & perf

For bicleaner, you use https://github.com/mozilla/firefox-translations-training/blob/main/pipeline/bicleaner/download_pack.py#L93 ```py def main(args: Optional[list[str]] = None) -> None: ``` which is not allowed in this environment, defined [here](https://github.com/mozilla/firefox-translations-training/blob/main/envs/bicleaner.yml) to use python 3.7 > Traceback (most...

When inspecting the running pipeline I have to download artifacts on the local machine quite often. I essentially do: `wget http://artifact.zst` `zstd -dc artifact.zst | head -n 100` or similar....

enhancement
taskcluster

We recently upgraded [worker-runner](https://github.com/taskcluster/taskcluster/tree/main/tools/worker-runner) on the GPU workers to a version that is supposed to gracefully handle spot preemptions. Most notably, it should be uploading artifacts before an instance terminates....

taskcluster

We have a great bar chart to compare across the models for one experiment (runs group). We should figure out how to tune this dashboard or rename the steps so...

platform

Apparently we already use some monolingual data from there as a custom corpus based on @gregtatum's investigation. Also we have a tool to list the available data https://github.com/mozilla/firefox-translations-training/pull/397

data