html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

MoveChild doesn't set parent of childs correctly

Open dartrax opened this issue 1 year ago • 3 comments

Let's say I have an HtmlDocument that has some child nodes in DocumentNode like <div></div><div></div><div></div>. I want to encapsulate the existing childs into a new <html></html> Node:

private void InsertHtmlRootElement(HtmlDocument doc) {
    HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
    if (HtmlTag is null) {
        // Create new <html></html> Node
        HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
        // Move all childs from root into the new HTML Node
        HtmlTag.MoveChildren(doc.DocumentNode.ChildNodes);
        // Append new HTML Node into Document
        doc.DocumentNode.AppendChild(HtmlTag);
        // HtmlTag.ChildNodes.ForEach(c => c.SetParent(HtmlTag));
    }
}

This works but all children of HtmlTag have their ParentNode set to null. I guessed that I first need to add the HtmlTag to the document and then insert the childs to fix it:

private void InsertHtmlRootElement(HtmlDocument doc) {
    HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
    if (HtmlTag is null) {
        // Append new <html></html> Node
        HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
        doc.DocumentNode.AppendChild(HtmlTag);
        // Move all childs from root into the new HTML Node
        doc.DocumentNode.ChildNodes.Except([HtmlTag]).ToArray().ForEach( c => {
            HtmlTag.MoveChild(c);
            // c.SetParent(HtmlTag);
        });
    }
}

As it turns out, the children still have no parent. To work around that, I have to call c.SetParent(HtmlTag); explicitly.

dartrax avatar Dec 03 '24 10:12 dartrax

Hello @dartrax ,

The method MoveChilden seems to have been created to move a node from one document to another document, not within the same document.

The MoveChildren method calls the RemoveChildren method: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.cs#L1770. This causes an issue because the child is fully removed from the document after being correctly added with a parent.

I don't know if that's a bug or if this works as intended by the original author (I would have expected the same behavior as you). What I would recommend you is probably this code:

public void InsertHtmlRootElement(HtmlDocument doc)
{
	HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
	if (HtmlTag is null)
	{
		// Create new <html></html> Node
		HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };

		var childNodes = doc.DocumentNode.ChildNodes.ToList();
		childNodes.ForEach(x => doc.DocumentNode.RemoveChild(x)); 

		// Append new HTML Node into Document
		doc.DocumentNode.AppendChild(HtmlTag);

		// Move all childs from root into the new HTML Node
		childNodes.ForEach(x => HtmlTag.AppendChild(x));
	}
}

All childs get removed from the document and after added to the HTML node.

Best Regards,

Jon

JonathanMagnan avatar Dec 11 '24 15:12 JonathanMagnan

Hi Jon, thanks for your reply. Your recommended code works, thank you!

I would suggest to add the MoveChildren method to the docs, because currently I couldn't find any documentation for it. Then the intended behavior could be documented.

Ideally, of course this use case should be added to make it work in both cases ;-)

dartrax avatar Dec 21 '24 14:12 dartrax

Thank you for your suggestion.

Both methods have been added with a note about the "different document": https://html-agility-pack.net/manipulation

Unfortunately, .NET Fiddle is currently down so I was not able to add the example

Best Regards,

Jon

JonathanMagnan avatar Dec 22 '24 18:12 JonathanMagnan