floki icon indicating copy to clipboard operation
floki copied to clipboard

Floki/mochiweb doesn't support HTML named entities without semicolon

Open wmnnd opened this issue 10 months ago • 0 comments

Some HTML named character references are allowed to be used without the trailing semicolon according to the HTML specs. For example, instead of   it's technically allowed to just write &nbsp. Floki/mochiweb however doesn't recognize this and instead transforms &nbsp to &nbsp.

Lexbor/fast_html handles this correctly.

raw_html = """
<!doctype html>
<body>
Before-&nbsp-After
</body>
"""

parsed_html = raw_html |> Floki.parse_document!() |> Floki.raw_html()

File.write!("raw-html.html", raw_html)
File.write!("parsed-html.html", parsed_html)

wmnnd avatar Feb 27 '25 09:02 wmnnd