JSoup differs from browsers around commented HTML attributes
Hi,
I encountered a case where JSoup differs from what browsers (Chrome, Firefox Safari) do.
Using this piece of HTML on try jsoup:
<html>
<head>
<title>Try jsoup</title>
</head>
<body>
<h1>before</h1>
<div <!--="" id="hidden" --="">
<h1>within</h1>
</div>
<h1>after</h1>
</body>
</html>
Jsoup will produce:
<html>
<head>
<title>Try jsoup</title>
</head>
<body>
<h1>before</h1>
<div>
<!--="" id="hidden" --="">
<h1>within</h1>
</div>
<h1>after</h1>
</body>
</html>
-->
</div>
</body>
</html>
Commenting the rest of the body whereas all major navigators will escape the comment character and shows the 3 titles.
Probably a similar issue than https://github.com/jhy/jsoup/issues/1483 except here it comment pretty much all the HTML.
Yes I believe @panthony is right -- the browsers aren't treating this as a comment but as attributes on the div tag, like:
<div
Attr: <!--
Attr: id = hidden
Attr: --
>
Will need to revisit #1483, either implement my idea or scrap the attempt to handle missing > and just hard follow the spec.