Jaume Zaragoza

Results 124 comments of Jaume Zaragoza

Trying to spot the models affected by the issue using alignments, I translated the 200 shortest sentence pairs from the [Tatoeba Translation Challenge test](https://github.com/Helsinki-NLP/Tatoeba-Challenge) and computed alignments between the source...

An update of the previous table but this time sourcing short sentences from https://github.com/mozilla-l10n/mt-training-data . The tested models are from current main branch in https://github.com/mozilla/firefox-translations-models/ using model type in this...

I already r[an the new evals](https://gregtatum.github.io/taskcluster-tools/src/training/?dashboardName=Evaluate+short+sentences&taskGroupIds=CGxM54KqRjGSOwqnEMTrKg&taskGroupNames=%5B%5D) for that model and the results are [here](https://firefoxci.taskcluster-artifacts.net/aLxUFDXfS0yaNnISAeaffg/0/public/build/devtest.metrics.json). The score shown in the json is `-0.00087` (0 to 1 scale), which scaled to a...

I was going to say this can be closed. However, that it-en (base-mem) has been trained in the last batch and should be fixed, but has it been trained from...

Doing this for translating into English seems fine, but translating into Chinese gives me more doubts. As traditional Chinese corpora might be in Cantonese or other variants that are not...

> I wonder if just filtering Cantonese and converting Mandarin between Traditional and Simplified scripts will work for training separate models. I see fast-text supports yue language code (Cantonese) and...

It may be that the effect of adding syntheticly translated monolingual data is more noticeable if the language pair is low/mid-resource. Backtranslation usually has a big impact in low-resource.

I guess that if you need to set the TPU scope during model loading (not exactly as the example, because that is for training, though), you will need to do...

Closing this for now, feel free to re-open it if there are more doubts.