jsoup icon indicating copy to clipboard operation
jsoup copied to clipboard

JSoup differs from browsers around commented HTML attributes

Open panthony opened this issue 2 years ago • 2 comments

Hi,

I encountered a case where JSoup differs from what browsers (Chrome, Firefox Safari) do.

Using this piece of HTML on try jsoup:

<html>
<head>
<title>Try jsoup</title>
</head>
<body>
  <h1>before</h1>
  <div <!--="" id="hidden" --="">
      <h1>within</h1>
  </div>
   <h1>after</h1>
</body>
</html>

Jsoup will produce:

<html>
 <head>
  <title>Try jsoup</title>
 </head>
 <body>
  <h1>before</h1>
  <div>
   <!--="" id="hidden" --="">
      <h1>within</h1>
  </div>
   <h1>after</h1>
</body>
</html>
-->
  </div>
 </body>
</html>

Commenting the rest of the body whereas all major navigators will escape the comment character and shows the 3 titles.

panthony avatar Apr 18 '23 07:04 panthony

Probably a similar issue than https://github.com/jhy/jsoup/issues/1483 except here it comment pretty much all the HTML.

panthony avatar Apr 18 '23 15:04 panthony

Yes I believe @panthony is right -- the browsers aren't treating this as a comment but as attributes on the div tag, like:

<div
  Attr: <!--
  Attr: id = hidden
  Attr: --
>

Will need to revisit #1483, either implement my idea or scrap the attempt to handle missing > and just hard follow the spec.

jhy avatar Apr 25 '23 10:04 jhy