floki icon indicating copy to clipboard operation
floki copied to clipboard

Floki is less lenient with nested comments than browsers

Open wmnnd opened this issue 10 months ago • 1 comments

HTML doesn’t allow nested comments. However, both Firefox and Chromium are somewhat lenient about that which can result in surprising issues when you parse a document with Floki (I tried this with 0.37.0):

raw_html = """
<!doctype html>
<body>
Before the comment<br>

<!--[if mso | IE]>
  <div>
    <!-- this is a nested comment -->
  </div>
<![endif]-->

After the comment.
</body>
"""
parsed_html = raw_html |> Floki.parse_document!() |> Floki.raw_html()

File.write!("raw-html.html", raw_html)
File.write!("parsed-html.html", parsed_html)

raw-html.html looks exactly like the original string:

<!doctype html>
<body>
Before the comment<br>

<!--[if mso | IE]>
  <div>
    <!-- this is a nested comment -->
  </div>
<![endif]-->

After the comment.
</body>

But parsed-html.html looks like this:

<body>
Before the comment<br/><!--[if mso | IE]>
  <div>
    <!-- this is a nested comment -->
&lt;![endif]--&gt;

After the comment.
</body>

Floki escapes the > of the outer comment to &gt;. And because browsers are lenient when handling nested comments, this changes the way this file is displayed:

Image

Image

I’m not sure if this could be considered a bug but I did find it somewhat unexpected.

wmnnd avatar Feb 16 '25 09:02 wmnnd

It looks like other browsers behave like this specifically for Conditional Comments for IE:

A regular non-conditional comment is rendered like this:

<!doctype html>
<body>
Before the comment<br>

<!-- this is just a random comment
  <div>
    <!-- this is a nested comment -->
  </div>
this is the end of the random comment -->

After the comment.
</body>

Image

wmnnd avatar Feb 16 '25 11:02 wmnnd