EasyNMT icon indicating copy to clipboard operation
EasyNMT copied to clipboard

Some translations are not possible

Open leandroalbero opened this issue 1 year ago • 1 comments

Issue description

Running latest image easynmt/api:2.0-cpu with the model set to m2m_100_418M and english as target language fails for some translations. Here are some examples:

  • 'imagina a mi'
  • 'imagina un sol'
  • 'imagina a un vikingo'

image In this case for example, setting the source_lang to 'es' fixed the issue, so maybe the problem is somewhere in the language detection step or that there isn't a translation direction from the detected language to english.

Docker logs output:

[2023-09-28 08:38:08 +0000] [60] [INFO] Waiting for application startup.
[2023-09-28 08:38:08 +0000] [60] [INFO] Application startup complete.
Exception: 'jbo'

the text of the exception varies with every prompt, I guess it is the code of the detected language

leandroalbero avatar Sep 28 '23 09:09 leandroalbero

Updating the model used by fasttext for language identification helps solve the issue, at least for the translations that failed in my tests. https://fasttext.cc/docs/en/language-identification.html This repo is using lid.176.ftz, switching to lid.176.bin helps because it is slightly more accurate Lines to change are here: https://github.com/UKPLab/EasyNMT/blob/7c11ae80f59e680653efa23c45e0704928aa4bf2/easynmt/EasyNMT.py#L415-L430 Yet there are still some translations that fail, maybe enabling a fallback in those cases to a slower model could help

leandroalbero avatar Sep 28 '23 10:09 leandroalbero