html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

Parsing unexpectedly self-closed form tag

Open arabcewicz opened this issue 7 years ago • 3 comments

Hello, take a look for the test:

var html = @"
<form />
<input name=""foo"">
<input name=""bar"">
</form>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
doc.DocumentNode.Descendants("form").ElementAt(0).Descendants("input").Should().HaveCount(2);

It fails as the descendants input collection has no item. But such broken form elements are properly interpreted by browsers. Could you improve HAP parsing to handle this like browsers?

arabcewicz avatar Jul 20 '18 08:07 arabcewicz

Hello @arabcewicz ,

This issue is caused because form is not a self closing tag: http://xahlee.info/js/html5_non-closing_tag.html

We will try to check how we can handle it this scenario

Best Regards,

Jonathan

JonathanMagnan avatar Jul 20 '18 13:07 JonathanMagnan

Hi Jonathan,

Last update is from Jul 20, I was wondering if there was something new to share on this issue. I tried to resolve the issue myself, and the idea was to flatten the DOM tree, identify an empty form and a #test nodes with innerHtml of "" with orphaned form elements in between and try to collapse them into the form itself, I was hoping you could do it at parse time :)

Thank you in advance, Shahar

sperlis avatar Dec 04 '18 15:12 sperlis

Hello @sperlis ,

If I remember well, we tried some stuff but we didn't like our fix.

As said, the form is not a self-closed tag, so we maybe choose at this time to work on some other issue instead.

JonathanMagnan avatar Dec 06 '18 20:12 JonathanMagnan