Opus-MT icon indicating copy to clipboard operation
Opus-MT copied to clipboard

How to skip translation of some words?

Open Pablohn26 opened this issue 1 year ago • 2 comments

Hi, I would like to use this model to translate XML content (to translate Android Apps). The problem is that it is also translating some code words of the content that I do not want to translate, and adding some spaces that would break the content. How could I skip translation xml code strings?

imagen

For example ChatGPT is respecting that:

imagen

Another option would be training this model only for Android xml language files. Could you point me to a guide to do so?

Thanks for sharing this amazing software.

Pablohn26 avatar Mar 05 '23 18:03 Pablohn26

There is no immediate fix for this as the models are trained to use plain text as input and they haven't seen tagged data. One could do some kind of pre- and post-processing to keep tags in place or some clever fine-tuning as you point out.

Do I understand correctly that you basically only want to translate the text between the XML tags (raw text but not the XML stuff?). You could send only those to the model and insert the translations into the XML template. Would that work for you?

jorgtied avatar Mar 06 '23 15:03 jorgtied

Hi @Pablohn26 ,

You could use a library such as https://lxml.de to get the text, then send it to the model.

bukosabino avatar Mar 10 '23 17:03 bukosabino