html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

Parent OuterLength wrong when using <br/>

Open meum opened this issue 1 year ago • 1 comments

The / inside <br/> does not get included in the parent OuterLength and OuterHtml, even though it is correct in the br node itself.

Example:

var htmlDoc2 = new HtmlDocument();
htmlDoc2.LoadHtml("<html><body><br/></body></html>");
Console.WriteLine(htmlDoc2.DocumentNode.OuterLength.ToString()); // One too low because it doesn't count the /
Console.WriteLine(htmlDoc2.DocumentNode.OuterHtml); // Missing the /
Console.WriteLine(htmlDoc2.DocumentNode.SelectSingleNode("//br").OuterLength.ToString()); // Correct
Console.WriteLine(htmlDoc2.DocumentNode.SelectSingleNode("//br").OuterHtml); // Correct

Expected output:

31
<html><body><br/></body></html>
5
<br/>

Actual output:

30
<html><body><br></body></html>
5
<br/>

meum avatar Aug 26 '24 10:08 meum

Hello @meum ,

Thank you for reporting.

Here is what we found out so far,

Some node like DocumentNode have their outerhtml re-written since the value _changed = true, so the UpdateHtml method is called.

When directly using to the node "br", the _changed = false which means it take the text directly from the one provided instead: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.cs#L681

We will dive more into this issue, but at least we now understand why we have a different behavior.

Best Regards,

Jon

JonathanMagnan avatar Aug 26 '24 16:08 JonathanMagnan

Hello @meum ,

A new option has been added starting from v1.11.65: OptionWriteEmptyNodesWithoutSpace

To write an "empty node" such as br with an ending tag, you need to use the option OptionWriteEmptyNodes = true; unfortunately, it also adds an additional space. So by also using the option OptionWriteEmptyNodesWithoutSpace = true, this additional space will be removed. That's currently not a perfect fix as keeping the original ending would have probably be better, but surely better then the current behavior:

var htmlDoc2 = new HtmlDocument();
htmlDoc2.OptionWriteEmptyNodes = true;
htmlDoc2.OptionWriteEmptyNodesWithoutSpace = true;

Best Regards,

Jon

JonathanMagnan avatar Aug 30 '24 18:08 JonathanMagnan