html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

Closing Paragraph tag removed when no content provided

Open ghost opened this issue 7 years ago • 4 comments

Related to issue #1

Encountered in v1.5.1 - When parsing a P block with no content, HAP will remove the closing P. For example: Starting HTML: <p style="font-size:1px;"></p><p>test</p> HTML after HAP: <p style="font-size:1px;"><p>test</p>

Using HtmlAgilityPack.HtmlDocument.DisableBehavaiorTagP = true; seem to resolve this issue. Should we expect to continue to use that option in future releases?

ghost avatar Jul 29 '17 00:07 ghost

Hello @chrisnelsondotca ,

I'm currently looking at some issue similar to this one,

I hope to be able to provide more information by next Monday.

Best Regards,

Jonathan

JonathanMagnan avatar Jul 29 '17 00:07 JonathanMagnan

Hello @chrisnelsondotca ,

We will try to work on all this kind of issue in September.

In a future release, you will not have to keep using this option. We will probably directly remove it once all this kind of issue is fixed.

We don't have yet a fixed date for it but I will try to keep you updated once we will start to work on this issue.

Best Regards,

Jonathan

JonathanMagnan avatar Aug 07 '17 14:08 JonathanMagnan

Has the behaviour of this changed? Because DisableBehavaiorTagP = true still seems to remove closing P tags for me. I'm on version 1.7.1

HtmlDocument.DisableBehavaiorTagP = true;

var htmlDocument = new HtmlDocument();

const string testHtml = "<p>before<div>middle</div>after</p>";
htmlDocument.LoadHtml(testHtml);

var divNode = htmlDocument.DocumentNode.SelectSingleNode("/p/div");
var divParagraph = divNode.ParentNode;
divParagraph.InnerHtml = divParagraph.InnerHtml.Replace(divNode.OuterHtml, "</p>" + divNode.OuterHtml + "<p>");

Console.WriteLine(htmlDocument.DocumentNode.InnerHtml);

Expected result:

<p>before</p><div>middle</div><p>after</p>

Actual result:

<p>before<div>middle</div><p>after</p></p>

Now, this works in a browser, as the <div> element closes the first <p> element, but I'm writing something for Facebook Instant Articles, which is very strict, and doesn't want <figure> elements inside <p> elements. Facebook being Facebook, it doesn't consider the behaviour of paragraph elements, and simply says the next element is a child.

Rene-Sackers avatar Mar 23 '18 14:03 Rene-Sackers

Hello @Rene-Sackers ,

Thank you for reporting, we will look at it.

We added some methods that allow us more easily to handle this kind of scenario. So perhaps now we can do something about it.

Best Regards,

Jonathan

JonathanMagnan avatar Mar 24 '18 12:03 JonathanMagnan