Evgeny Pavlov

Results 112 comments of Evgeny Pavlov

Add support for the OpusTrainer Tags modifier. It might require training alignments based on space tokenization instead of sentencepiece one. See[ this issue](https://github.com/hplt-project/OpusTrainer/issues/38).

I'm still training students with inline noise. We'll need to test all these cases with the final model, ideally in Nightly to say that they were fixed.

Noise/inline noise augmentations are supposed to take care of most of those cases but we'll need to verify it all in the wild when the quantized models arrive.

It's basically this: https://github.com/mozilla/firefox-translations-training/issues/455 > you mentioned something about quantization failures in a meeting when we discussed this, can you elaborate on that? I'm not sure, we should investigate

Something happens on `[taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials:`. I see two 20 minutes gaps when this line appears: https://firefox-ci-tc.services.mozilla.com/tasks/TnmPTMeqSPWB727etTAAaw/runs/1/logs/live/public/logs/live.log

This might be required to enable pre-emption for student models. cc @bhearsum

If we support continuation on preemption we can close this as we don't plan to use manual training continuation for the students now.

It's important to make sure we'll be able to track experiments properly. It's probably better not to split training until we have real-time publication. Even then we'll need to support...

Related to https://github.com/mozilla/firefox-translations-training/issues/191 We should upload the artifacts and logs for: - all the `train-` and `finetune-` steps - `vocab` - `export` - all the `evaluate-` steps - train action...

Also, do we want to move the data to a more production-grade bucket from `gs://releng-translations-dev`?