html-agility-pack
html-agility-pack copied to clipboard
Closing span and p in wrong order *generates* invalid html
Try this program:
using HtmlAgilityPack;
using System;
class Program
{
const string test = @"
<html>
<body>
<span>
<p>Foo</span></p>
<p>Bar</p>
</body></html>";
static void Main(string[] args)
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(test);
Console.WriteLine(doc.DocumentNode.OuterHtml);
Console.ReadLine();
}
}
It outputs the following html:
<html>
<body>
<span>
<p>Foo</span><p>
<p>Bar</p>
</body></html>
Note that the first p endtag has suddenly become a start tag, i.e. HtmlAgilityPack is somehow generating malformed html in this case - and its worse than the input because there are now unmatched open tags.
Note that the generated dom in Chrome for this is:
<html><head></head><body>
<span>
<p>Foo</p>
<p>Bar</p>
</span></body></html>
I think this matches what the spec says. The </span>
is ignored by the "Any other end tag" rule at https://html.spec.whatwg.org/#parsing-main-inbody - first it hits "4." because font does not match p. Then node is the p tag and "3." is hit because p is special and does not match span, which causes the whole token to be ignored. Then everything continues without error until </body>
is hit. This and the following </html>
are both ignored as parse errors because there is still a span
element on the stack of open elements. Then finally it hits EOF (another parse error) and then stops parsing where it pops all elements off of the stack of open elements (presumably while inserting closing tags for them, though I can't find where it says that in the spec.)
Hello @poizan42 ,
Thank you for reporting.
Currently HAP
is a mix between following HTML
spec and fixing some issue. It doesn't work 100% exactly as the browser does.
We currently have too many requests but we hope to be able to check it at the end of the week.
Best Regards,
Jonathan
Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework Extensions • Entity Framework Classic • Bulk Operations • Dapper Plus
Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval Function • SQL Eval Function