EasyNMT
EasyNMT copied to clipboard
Enable manually specifying the desired OPUS model?
I really like the library, great work! Is there a way to manually specify a specific OPUS model? For example EasyNMT with OPUS currently does not support English as source and Portuguese as target language because it tries to download 'opus-mt-en-pt' by default, which does not exist. There is, however, an en2pt model on the hub now (https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) with a slightly different name. I don't know how to tell EasyNMT to take this specific model instead of throwing the following error:
OSError: Helsinki-NLP/opus-mt-en-pt is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
For some specific sentences it also seems that model.translate
constructs a non-existent model identifier. For example, for some sentences in dutch, instead of the correct Helsinki-NLP/opus-mt-nl-en
, it looks for non-existent Helsinki-NLP/opus-mt-nds-en
, which then throws the same error @MoritzLaurer mentioned.
You can bypass easynmt entirely and do it for example with transformers (pip install transformers
):
from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-pt")
print(pipe("Nobody Expects the Spanish Inquisition")[0]["translation_text"])
but in this case you'll need to manually deal with sentence tokenization so it's not as easy as easynmt. Or you can use EasyNMT.sentence_splitting()
https://github.com/UKPLab/EasyNMT/blob/main/easynmt/EasyNMT.py#L444
I'm seeing this same problem:
Helsinki-NLP/opus-mt-pt-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
bit of a blocker for me as I'm trying to translate multiple different languages and it would be nice if easynmt just handled them all correctly - does anyone know how to go about fixing this?
Adding source_lang
resolved this problem for me.