translate-html icon indicating copy to clipboard operation
translate-html copied to clipboard

translate-html should parse whitespace in HTML style

Open codecivil opened this issue 1 year ago • 1 comments

When an HTML string contains whitespace like tabs or newlines inside a tag, translate-html returns separate translations for the lines (or tab separated text). This is not consistent with the usual HTML parsing. For examples, please see https://github.com/LibreTranslate/LibreTranslate/issues/288

All parts employed in the translation work correctly, e.g. BeautifulSoup respects whitespace (and should do so) when returning the tag tree and the translation function of the single text strings correctly assumes that new lines mean new content, not knowing anymore that the string came from an HTML tag.

So, somewhere on the way the HTML code should be "minified" in order to be consistent with a browser's parsing of the code - at the cost of breaking the visual formatting of the HTML code by translation. I am not sure where this should be done best but I would suggest to do it in the translate_html function. I will create a corresponding PR.

codecivil avatar Apr 04 '23 09:04 codecivil