deepl-python
deepl-python copied to clipboard
translate_text corrupts HTML
text=
<html>
<body>
<div>
<a href="01.html">Chapter I. Margaret Makes Herself at Home</a>
</div>
<div>
<a href="02.html">Chapter II. Stephen's Life Goes On</a>
</div>
</body>
</html>
translate_text(text, source_lang='EN', target_lang='DE', tag_handling='html') for the above text returns this:
<html>
<body>
<div>
<a href="01.html">Kapitel I. Margaret macht es sich gemüt</a>lich </div>
<div>
<a href="02.html">Kapitel II. Stephens Leben geht</a>weiter </div>
</body>
</html>
As you can see the content of <a> has lost its tail (lich, weiter).
If we use tag_handling='xml' all works as expected:
<html>
<body>
<div>
<a href="01.html">Kapitel I. Margaret macht es sich gemütlich</a>
</div>
<div>
<a href="02.html">Kapitel II. Stephens Leben geht weiter</a>
</div>
</body>
</html>
If we replace <div> with <p> there will be no issue either.
Another example. text=
<p>1-<i>London, Paris</i></p>
translate_text returns:
<p>1-London<i>, Paris</i></p>
Same result with tag_handling='html' and tag_handling='xml'
@pbtsrc By chance, are you using both tag_handling and preserve_formatting parameters?
No, I did not use preserve_formatting. I tried to add this parameter, but it did not change anything.