html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

Closing span and p in wrong order *generates* invalid html

Open poizan42 opened this issue 6 years ago • 1 comments

Try this program:

using HtmlAgilityPack;
using System;

class Program
{
  const string test = @"
<html>
<body>
<span>
<p>Foo</span></p>
<p>Bar</p>
</body></html>";
  static void Main(string[] args)
  {
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(test);
    Console.WriteLine(doc.DocumentNode.OuterHtml);
    Console.ReadLine();
  }
}

It outputs the following html:


<html>
<body>
<span>
<p>Foo</span><p>
<p>Bar</p>
</body></html>

Note that the first p endtag has suddenly become a start tag, i.e. HtmlAgilityPack is somehow generating malformed html in this case - and its worse than the input because there are now unmatched open tags.

Note that the generated dom in Chrome for this is:

<html><head></head><body>
<span>
<p>Foo</p>
<p>Bar</p>

</span></body></html>

I think this matches what the spec says. The </span> is ignored by the "Any other end tag" rule at https://html.spec.whatwg.org/#parsing-main-inbody - first it hits "4." because font does not match p. Then node is the p tag and "3." is hit because p is special and does not match span, which causes the whole token to be ignored. Then everything continues without error until </body> is hit. This and the following </html> are both ignored as parse errors because there is still a span element on the stack of open elements. Then finally it hits EOF (another parse error) and then stops parsing where it pops all elements off of the stack of open elements (presumably while inserting closing tags for them, though I can't find where it says that in the spec.)

poizan42 avatar Feb 21 '19 17:02 poizan42

Hello @poizan42 ,

Thank you for reporting.

Currently HAP is a mix between following HTML spec and fixing some issue. It doesn't work 100% exactly as the browser does.

We currently have too many requests but we hope to be able to check it at the end of the week.

Best Regards,

Jonathan


Performance Libraries context.BulkInsert(list, options => options.BatchSize = 1000); Entity Framework ExtensionsEntity Framework ClassicBulk OperationsDapper Plus

Runtime Evaluation Eval.Execute("x + y", new {x = 1, y = 2}); // return 3 C# Eval FunctionSQL Eval Function

JonathanMagnan avatar Feb 25 '19 03:02 JonathanMagnan