OPUS-MT-train What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library?

What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library?

Open shandou opened this issue 2 years ago • 2 comments

I have been testing pre-trained Opus-MT models ported to transformers library for python implementation. Specifically, I am using opus-mt-en-fr for English to French translation. And the tokenizer and translation model is loaded via MarianTokenizer and MarianMTModels--similar to code examples shown here on huggingface. Strangely, for the same pre-trained model translating the same English input on an identical machine, I have observed anywhere between 80+ ms and (whopping) 4 s per translation (example input = "kiwi strawberry").

Wonder if anyone has observed similar behaviours, and what could cause such a wide variation? Thank you very much!

Sep 08 '22 16:09 shandou

Maybe asking people at huggingface and the transformers git repo would help?

Jan 12 '23 08:01 jorgtied

Good afternoon. Hypothetically, maybe the CPU or GPU load affected the performance of the model? Have you tried to monitor the load on the hardware component while performing measurements?

Jan 12 '23 08:01 artyomboyko

OPUS-MT-train OPUS-MT-train copied to clipboard

What could cause widely varying inference time when using pre-trained opus-mt-en-fr model with python transformers library?

OPUS-MT-train
OPUS-MT-train copied to clipboard