firefox-translations-training
firefox-translations-training copied to clipboard
Some words should be passing through untranslated (e.g. IPA characters, emojis, etc.)
See also: https://github.com/mozilla/firefox-translations/issues/365 https://github.com/browsermt/bergamot-translator/issues/185 https://github.com/browsermt/bergamot-translator/issues/419 https://github.com/mozilla/firefox-translations/issues/514 https://github.com/mozilla/firefox-translations/issues/511 https://github.com/mozilla/firefox-translations/issues/442 https://github.com/mozilla/firefox-translations/issues/375 https://bugzilla.mozilla.org/show_bug.cgi?id=1862017
And also links, see https://bugzilla.mozilla.org/show_bug.cgi?id=1862017
Add support for the OpusTrainer Tags modifier.
It might require training alignments based on space tokenization instead of sentencepiece one. See this issue.
@eu9ene is this fixed by the inline noise augmentation? Does it cover all cases mentioned in the first comment?
I'm still training students with inline noise. We'll need to test all these cases with the final model, ideally in Nightly to say that they were fixed.
Noise/inline noise augmentations are supposed to take care of most of those cases but we'll need to verify it all in the wild when the quantized models arrive.