floki
floki copied to clipboard
Floki/mochiweb doesn't support HTML named entities without semicolon
Some HTML named character references are allowed to be used without the trailing semicolon according to the HTML specs. For example, instead of it's technically allowed to just write  . Floki/mochiweb however doesn't recognize this and instead transforms   to &nbsp.
Lexbor/fast_html handles this correctly.
raw_html = """
<!doctype html>
<body>
Before- -After
</body>
"""
parsed_html = raw_html |> Floki.parse_document!() |> Floki.raw_html()
File.write!("raw-html.html", raw_html)
File.write!("parsed-html.html", parsed_html)