translate-html
translate-html copied to clipboard
translate-html should parse whitespace in HTML style
When an HTML string contains whitespace like tabs or newlines inside a tag, translate-html returns separate translations for the lines (or tab separated text). This is not consistent with the usual HTML parsing. For examples, please see https://github.com/LibreTranslate/LibreTranslate/issues/288
All parts employed in the translation work correctly, e.g. BeautifulSoup respects whitespace (and should do so) when returning the tag tree and the translation function of the single text strings correctly assumes that new lines mean new content, not knowing anymore that the string came from an HTML tag.
So, somewhere on the way the HTML code should be "minified" in order to be consistent with a browser's parsing of the code - at the cost of breaking the visual formatting of the HTML code by translation. I am not sure where this should be done best but I would suggest to do it in the translate_html function. I will create a corresponding PR.