firefox-translations
firefox-translations copied to clipboard
Translations with all caps text sometimes fail
Describe the bug
Firefox Translations has sometimes issues with translating if all letters are uppercase, as not unusual for headlines on some websites.
To Reproduce
- Set your content language to German
- Open https://edition.cnn.com/2022/08/05/uk/royal-news-newsletter-08-05-22-scli-gbr-cmd-intl/index.html
- Look at the all caps headlines "DID YOU KNOW?", "WHAT ELSE IS HAPPENING?" and "PHOTO OF THE WEEK"
- Press the translate button
- Reload the website
- Use the developer tools to change the above headlines to "Did you know?", "What else is happening?" and "Photo of the week"
- Repeat step 4
Expected behavior
The headlines are translated and the result is the same after step 4 and after step 7.
Actual behavior
Translations after step 4:
DID SIE WISSEN? WAS ELSE IST HAPPENING? FOTO DER WOCHE
(The first two are a mix of English and German words, the last translation is correct)
Translations after step 7:
Wussten Sie? Was ist noch los? Foto der Woche
(These are all correct translations)
Desktop (please complete the following information as possible):
- Firefox Browser version: 105.0a1 (2022-08-07)
- Processor information: Apple M1 Pro
- OS version: macOS 12.5
- System type (32-bit or 64-bit): 64 bit
- Firefox Translation Extension version: 1.1.4
- Webpage on which issue was observed: https://edition.cnn.com/2022/08/05/uk/royal-news-newsletter-08-05-22-scli-gbr-cmd-intl/index.html
- Language of the webpage: English
- Translating to (language): German
- Console logs from Web Developer Tools:
Using fallback gemm implementation Wasm Runtime initialized Successfully (preRun -> onRuntimeInitialized) in 0.004 secs Creating Translation Service with config: {"cacheSize":0} Translation Service created successfully Constructing translation model ende Translation Model config: beam-size: 1 normalize: 1.0 word-penalty: 0 max-length-break: 128 mini-batch-words: 1024 workspace: 128 max-length-factor: 2.0 skip-cost: true cpu-threads: 0 quiet: true quiet-translation: true gemm-precision: int8shiftAlphaAll alignment: soft Aligned memory sizes: Model:17140835, Shortlist:3943644, Vocab: 784269 Model 'ende' successfully constructed. Time taken: 0.108 secs loadLanguageModel function complete
System architecture and extension information can be found as follows: Go to about:telemetry#environment-data in browser and share "architecture" field under "build" category and "cpu.extensions" field under "system" category
architecture aarch64 cpu.extensions [hasNEON]
We need to add random capitalization to the training pipeline. https://github.com/browsermt/students .
This is tracked in https://github.com/mozilla/firefox-translations-training/issues/73.