html-agility-pack
html-agility-pack copied to clipboard
MoveChild doesn't set parent of childs correctly
Let's say I have an HtmlDocument that has some child nodes in DocumentNode like <div></div><div></div><div></div>. I want to encapsulate the existing childs into a new <html></html> Node:
private void InsertHtmlRootElement(HtmlDocument doc) {
HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
if (HtmlTag is null) {
// Create new <html></html> Node
HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
// Move all childs from root into the new HTML Node
HtmlTag.MoveChildren(doc.DocumentNode.ChildNodes);
// Append new HTML Node into Document
doc.DocumentNode.AppendChild(HtmlTag);
// HtmlTag.ChildNodes.ForEach(c => c.SetParent(HtmlTag));
}
}
This works but all children of HtmlTag have their ParentNode set to null. I guessed that I first need to add the HtmlTag to the document and then insert the childs to fix it:
private void InsertHtmlRootElement(HtmlDocument doc) {
HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
if (HtmlTag is null) {
// Append new <html></html> Node
HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
doc.DocumentNode.AppendChild(HtmlTag);
// Move all childs from root into the new HTML Node
doc.DocumentNode.ChildNodes.Except([HtmlTag]).ToArray().ForEach( c => {
HtmlTag.MoveChild(c);
// c.SetParent(HtmlTag);
});
}
}
As it turns out, the children still have no parent. To work around that, I have to call c.SetParent(HtmlTag); explicitly.
Hello @dartrax ,
The method MoveChilden seems to have been created to move a node from one document to another document, not within the same document.
The MoveChildren method calls the RemoveChildren method: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.cs#L1770. This causes an issue because the child is fully removed from the document after being correctly added with a parent.
I don't know if that's a bug or if this works as intended by the original author (I would have expected the same behavior as you). What I would recommend you is probably this code:
public void InsertHtmlRootElement(HtmlDocument doc)
{
HtmlNode HtmlTag = doc.DocumentNode.ChildNodes.FindFirst("html");
if (HtmlTag is null)
{
// Create new <html></html> Node
HtmlTag = new HtmlNode(HtmlNodeType.Element, doc, 0) { Name = "html" };
var childNodes = doc.DocumentNode.ChildNodes.ToList();
childNodes.ForEach(x => doc.DocumentNode.RemoveChild(x));
// Append new HTML Node into Document
doc.DocumentNode.AppendChild(HtmlTag);
// Move all childs from root into the new HTML Node
childNodes.ForEach(x => HtmlTag.AppendChild(x));
}
}
All childs get removed from the document and after added to the HTML node.
Best Regards,
Jon
Hi Jon, thanks for your reply. Your recommended code works, thank you!
I would suggest to add the MoveChildren method to the docs, because currently I couldn't find any documentation for it. Then the intended behavior could be documented.
Ideally, of course this use case should be added to make it work in both cases ;-)
Thank you for your suggestion.
Both methods have been added with a note about the "different document": https://html-agility-pack.net/manipulation
Unfortunately, .NET Fiddle is currently down so I was not able to add the example
Best Regards,
Jon