firefox-translations-training icon indicating copy to clipboard operation
firefox-translations-training copied to clipboard

Some words should be passing through untranslated (e.g. IPA characters, emojis, etc.)

Open marco-c opened this issue 1 year ago • 4 comments

See also: https://github.com/mozilla/firefox-translations/issues/365 https://github.com/browsermt/bergamot-translator/issues/185 https://github.com/browsermt/bergamot-translator/issues/419 https://github.com/mozilla/firefox-translations/issues/514 https://github.com/mozilla/firefox-translations/issues/511 https://github.com/mozilla/firefox-translations/issues/442 https://github.com/mozilla/firefox-translations/issues/375 https://bugzilla.mozilla.org/show_bug.cgi?id=1862017

marco-c avatar Aug 30 '23 10:08 marco-c

And also links, see https://bugzilla.mozilla.org/show_bug.cgi?id=1862017

image

eu9ene avatar Oct 30 '23 17:10 eu9ene

Add support for the OpusTrainer Tags modifier.

It might require training alignments based on space tokenization instead of sentencepiece one. See this issue.

eu9ene avatar Oct 31 '23 18:10 eu9ene

@eu9ene is this fixed by the inline noise augmentation? Does it cover all cases mentioned in the first comment?

marco-c avatar Mar 29 '24 10:03 marco-c

I'm still training students with inline noise. We'll need to test all these cases with the final model, ideally in Nightly to say that they were fixed.

eu9ene avatar Apr 01 '24 17:04 eu9ene

Noise/inline noise augmentations are supposed to take care of most of those cases but we'll need to verify it all in the wild when the quantized models arrive.

eu9ene avatar May 09 '24 00:05 eu9ene