minify-html icon indicating copy to clipboard operation
minify-html copied to clipboard

Escaped `<` characters (`&lt;`) are processed incorrectly

Open chrispy-snps opened this issue 1 year ago • 2 comments

This is a more specific follow-up to #182.

When the &lt; escape sequence is processed, it is incorrectly converted to &LT instead of kept as-is:

>>> import minify_html
>>> print(minify_html_onepass.minify("&lt;"))
<

>>> print(minify_html_onepass.minify("&lt;faketag"))
&LTfaketag

>>> print(minify_html_onepass.minify("&lt;faketag&gt;"))
&LTfaketag>

Strangely, a bare &lt; by itself is processed correctly. It is only when followed by content that it breaks.

The issue occurs in both minify_html and minify_html_onepass.

We are able to work around it as follows:

html = html.replace("&lt;", "AMP_LT_WORKAROUND")
html_minified = minify_html.minify(html)
html = html.replace("AMP_LT_WORKAROUND", "&lt;")

but a proper fix would be better (and more efficient, as we process tens of thousands of HTML files at a time).

chrispy-snps avatar Jun 07 '24 12:06 chrispy-snps

Hi @chrispy-snps, thank you for workaround

codingjerk avatar Jul 04 '24 13:07 codingjerk

See also #109 and #139.

Rongronggg9 avatar Aug 09 '24 14:08 Rongronggg9

This was fixed in minify-html==0.16.0.

chrispy-snps avatar Nov 23 '25 02:11 chrispy-snps