EasyNMT
EasyNMT copied to clipboard
Some translations are not possible
Issue description
Running latest image easynmt/api:2.0-cpu
with the model set to m2m_100_418M
and english as target language fails for some translations. Here are some examples:
- 'imagina a mi'
- 'imagina un sol'
- 'imagina a un vikingo'
In this case for example, setting the source_lang to 'es' fixed the issue, so maybe the problem is somewhere in the language detection step or that there isn't a translation direction from the detected language to english.
Docker logs output:
[2023-09-28 08:38:08 +0000] [60] [INFO] Waiting for application startup.
[2023-09-28 08:38:08 +0000] [60] [INFO] Application startup complete.
Exception: 'jbo'
the text of the exception varies with every prompt, I guess it is the code of the detected language
Updating the model used by fasttext for language identification helps solve the issue, at least for the translations that failed in my tests.
https://fasttext.cc/docs/en/language-identification.html
This repo is using lid.176.ftz
, switching to lid.176.bin
helps because it is slightly more accurate
Lines to change are here:
https://github.com/UKPLab/EasyNMT/blob/7c11ae80f59e680653efa23c45e0704928aa4bf2/easynmt/EasyNMT.py#L415-L430
Yet there are still some translations that fail, maybe enabling a fallback in those cases to a slower model could help