firefox-translations-training
firefox-translations-training copied to clipboard
Training pipelines for Firefox Translations neural machine translation models
This is a _very_ rough prototype for what's been discussed in https://github.com/mozilla/firefox-translations-training/issues/417 and https://github.com/taskcluster/taskgraph/issues/424. In this example, the decision about which tasks `merge-corpus` should depend on are being deferred and...
Bumps [grpcio](https://github.com/grpc/grpc) from 1.54.0 to 1.54.3. Release notes Sourced from grpcio's releases. Release v1.54.3 This is release 1.54.3 (gracious) of gRPC Core. For gRPC documentation, see grpc.io. For previous releases,...
For example, by using tools such as [compare-mt](https://github.com/neulab/compare-mt), or by comparing translations with a “good” known one and sorting by BLEU/COMET.
We'll want to experiment more with the modifiers and training data composition. Editing code adds friction to experimentation. The experiment config section might look like: ```yaml opus-trainer: teacher: stages: -...
For harder to segment languages we have Chinese, Japanese, and Korean. We'll need to implement better tokenization support and segmentation support for these languages in order to train them. This...
Right now it splits on word boundaries, and limits the size of the monolingual data to be less than 100 "words". This needs to be changed to support another segmentation...
We have a bunch of use cases when the DAG should be modified and we will have more in the future: 1. The number of chunks for translations 2. Skipping...
We had a meeting with our security team to talk about the language training pipeline. We agreed that because we generate artifacts that eventually ship to users, that we would...