turjuman
turjuman copied to clipboard
Token ids generated instead of translation
Hey there, I hope you're doing fine.
when running the command: turj.translate
it returns the token ids instead of the actual translation?
(see the output below)
2022-07-07 10:41:43 | INFO | turjuman.translate | Using beam search
tensor([[ 0, 6538, 2, 76, 6380, 1]])
Hi Ahmed, could you please provide us with more details such as your input sentence and screenshot? Thanks
as you can see the
turj.translate
returns output ids instead of translation, i have solved this by using the tokenizer and then decode the ids back to tokens:
tokenizer.decode(target, skip_special_tokens=True, clean_up_tokenization_spaces=True)
To integrate Turjuman with your python code, take a look at this notebook. https://colab.research.google.com/github/UBC-NLP/turjuman/blob/main/examples/Integrate_turjuman_with_your_code.ipynb Thanks
when you run that notebook, you get only the target ids, as shown in the screenshot.
Thanks Ahmed, we will check this soon
quick fix
result = torj.tokenizer.batch_decode(target, skip_special_tokens=True)