Opus-MT
Opus-MT copied to clipboard
Helsinki-NLP/opus-mt-tc-big-he-en nonsensical translation and the speed of the translation generally.
The hebrew to english model outputs really is nonsensical in a way.
- this is the input
"רוב האמריקאים רואים בישראל את בעלת ברית העליונה של ארה""ב ומדגישים את הערכים המשותפים של המדינות. כך על פי סקר חדש בארה""ב. יותר רפובליקנים ועצמאיים כינו את ישראל כבעלת הברית הבכירה של ארה""ב מאשר דמוקרטים"
- and the translated text is
Gen Gen Terrorism Terrorism Terrorism Terrorism Terrorism Terrorism Cookie Cookie discussions Cookie discussions Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly assembly Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg acknowledg discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions discussions
- i am using the models directly this is the code snippet i am using to do the translation, and to brief about the process of the script below, it is that, that i am reading the text from a .txt file which is then translated and stored in the output .txt file,
- the texts that i am using are meaningful text articles, actually data from the production, so the original text is legitimate
- the code snippet that i am using for the translation
def translate_text_file(input_filename, output_filename):
# Load tokenizer and model
model_name = fetch_model_name(source_language, target_language)
if "tc-big" in model_name:
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
with open(input_filename, "r", encoding="utf-8") as file:
input_text = file.read()
# translate text
inputs = tokenizer(
input_text, return_tensors="pt", padding=True, truncation=True
)
with torch.no_grad():
outputs = model.generate(**inputs)
translated_text = [
tokenizer.decode(t, skip_special_tokens=True) for t in outputs
]
# Save translated text to output file
with open(output_filename, "w", encoding="utf-8") as file:
for translation in translated_text:
file.write(translation + "\n")
- this code is a part of script, here. as far as i know i have written this script as documented. there is a possibility where the way in which i am trying to translate might be wrong for the model.
- but right now, with this script the generated translation is not proper.
slow translation of models
- is there any factor of hardware that might affect the speed of the process.
- what is the general speed in which the opus-mt-<src_lang>-
model translates in - these are the snapshots that i took after i timed the executions
- this one is timed when translating hebrew to english
-
Execution time: 172 seconds
- this one is timed for russian to english translation
-
Execution time: 22 seconds
- these models are built on marian nmt which are heavily dependent on the hardware, so if the speed of the translation models is dependent on the hardware, what would be the speed of the translation in the general machine
- and looking at the results, the time it took to translate hebrew to english is too much. and though being patient the result was not fruitful