DocumentTranslator-Legacy icon indicating copy to clipboard operation
DocumentTranslator-Legacy copied to clipboard

Need tag to exclude words from translation in Docx file.

Open DarrellJonsson opened this issue 6 years ago • 2 comments

MS translator engine offers 2 methods to exclude words from translation, using tags span and code -- neither of which work with DOCX files.

<span class="notranslate">  FEBRUARY 22, 2018  </span>
<code>  FEBRUARY 22, 2018  </code>

Using DocumentTranslator these tags work but only if the file is an HTML file

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/plain">
</head>
<body>

<span class="notranslate"> FEBRUARY 22, 2018  </span>
<code> FEBRUARY 22, 2018  </code>
</body>
</html>

None of the above though work in DOCX.

Is there a tag or other method for excluding words, paragraphs, lines using DocumentTranslator with a DOCX file?

DarrellJonsson avatar Mar 14 '18 12:03 DarrellJonsson

You could come up with a certain style that you apply to the untranslatable segments in Word, and respect that style in the Document Translator code. There is no such provision implemented right now. Probably easier is to use a dictionary on the Hub (http://hub.microsofttranslator.com) and then use your custom system from Document Translator.

chriswendt1 avatar Mar 14 '18 14:03 chriswendt1

Thanks for your replies Chris,

The dictionary will work fine for phrases and words, but not for longer texts.

For example - with documentation there are blocks of code in the DOCX which should not be translated, while one might want to have the surrounding instructions translated to the target language. Using a dictionary in such instances is not practical.

By what I'm seeing here though HTML is the only place where simple a simple tag such as <code> Some text here </code> is not recognized.

With DOCX being XML it would seem somehow such tags could be inserted at least in theory.

Any comments and/or ideas appreciated.

DarrellJonsson avatar Mar 14 '18 20:03 DarrellJonsson