firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
I noticed that I have some extra metrics generated by marian tensorboard that are missing on W&B dashboards. They are all useful. The missing ones: - valid/bleu-detok_stalled - valid/ce-mean-words_stalled -...
We should display things we look at often in W&B. Final merged corpus size after deduplication is something I look at periodically to understand how aggressive the cleaning is overall....
RTL languages shouldn't affect training, but doing so will require some work on the Firefox side. This meta bug tracks any work that is needed. We should complete a subset...
In the short term we are focusing on building up our language list by training easy to segment LTR languages, as they don't require changes to the training pipeline, and...
It can help reduce the teacher-student quality gap where we have little monolingual data in the source language. See: [From Research to Production and Back: Ludicrously Fast Neural Machine Translation](https://aclanthology.org/D19-5632.pdf)...
This fixes the edge case where we have alignments-original -> alignments-backtranslated both marked as `stage: alignments-teacher` and want to restart both of them. We should probably split them later to...
Latest config updates. Replace en-uk config with the one used to train the models (it lacks extra mono data).
https://wandb.ai/moz-translations/tr-en/workspace?nw=nwuserepavlov https://firefox-ci-tc.services.mozilla.com/tasks/groups/SDD81N6sRu61LOL4xZJc-Q
I ran this one to do the export task but since "evaluate" tasks are not sequential it leads to rerunning them each time I use start_stage which wastes GPU resources....