EasyNMT icon indicating copy to clipboard operation
EasyNMT copied to clipboard

Enable manually specifying the desired OPUS model?

Open MoritzLaurer opened this issue 2 years ago • 4 comments

I really like the library, great work! Is there a way to manually specify a specific OPUS model? For example EasyNMT with OPUS currently does not support English as source and Portuguese as target language because it tries to download 'opus-mt-en-pt' by default, which does not exist. There is, however, an en2pt model on the hub now (https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) with a slightly different name. I don't know how to tell EasyNMT to take this specific model instead of throwing the following error:

OSError: Helsinki-NLP/opus-mt-en-pt is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

MoritzLaurer avatar Jul 31 '22 10:07 MoritzLaurer

For some specific sentences it also seems that model.translate constructs a non-existent model identifier. For example, for some sentences in dutch, instead of the correct Helsinki-NLP/opus-mt-nl-en, it looks for non-existent Helsinki-NLP/opus-mt-nds-en, which then throws the same error @MoritzLaurer mentioned.

lucasfariaslf avatar Nov 10 '22 23:11 lucasfariaslf

You can bypass easynmt entirely and do it for example with transformers (pip install transformers):

from transformers import pipeline

pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-pt")
print(pipe("Nobody Expects the Spanish Inquisition")[0]["translation_text"])

but in this case you'll need to manually deal with sentence tokenization so it's not as easy as easynmt. Or you can use EasyNMT.sentence_splitting() https://github.com/UKPLab/EasyNMT/blob/main/easynmt/EasyNMT.py#L444

glowinthedark avatar Dec 12 '22 18:12 glowinthedark

I'm seeing this same problem:

Helsinki-NLP/opus-mt-pt-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

bit of a blocker for me as I'm trying to translate multiple different languages and it would be nice if easynmt just handled them all correctly - does anyone know how to go about fixing this?

tansaku avatar Feb 20 '23 17:02 tansaku

Adding source_lang resolved this problem for me.

wasifferoze avatar Jun 03 '24 18:06 wasifferoze