html-agility-pack
html-agility-pack copied to clipboard
SelectNodes().RemoveAt breaks node relationships
Description
Removing elements from the results of SelectNodes causes siblings of nodes either side of the one removed to go missing or change which parent they belong to. Either that or I'm really misunderstanding something.
Fiddle or Project
// @nuget: HtmlAgilityPack
using System;
using HtmlAgilityPack;
public class Program
{
public static void Main()
{
var html =
@"<html>
<head>
<title>Document</title>
</head>
<body>
<div class=""divClass"">
<h3 class=""h3Class"">First Header</h3>
<p class=""pClass"">
Hello
</p>
</div>
<div class=""divClass"">
<h3 class=""h3Class"">Second Header</h3>
<p class=""pClass"">
World!
</p>
</div>
<div class=""divClass"">
<h3 class=""h3Class"">Third Header</h3>
<p class=""pClass"">
Nonsense
</p>
</div>
</body>
</html>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
HtmlNode root = htmlDoc.DocumentNode;
HtmlNodeCollection headers = root.SelectNodes("//h3[contains(@class, 'h3')]");
// don't want the last one
headers.RemoveAt(headers.Count - 1); // without this line, it does what I expect. Both h3's and p's are displayed
foreach(HtmlNode node in headers)
{
Console.WriteLine("Found header: {0}", node.InnerText);
}
Console.WriteLine();
DisplayAllSiblings(headers[0]); // 'p = Hello' should be displayed
DisplayAllSiblings(headers[1]); // 'p = World!' has gone missing
}
static void DisplayAllSiblings(HtmlNode node)
{
HtmlNode parent = node.ParentNode;
HtmlNodeCollection coll = parent.SelectNodes("./*");
Console.WriteLine("Siblings of {0}:", node.InnerText);
foreach(HtmlNode brother in coll)
{
Console.WriteLine("Node: {0} = {1}", brother.Name, brother.InnerText.Trim());
}
Console.WriteLine();
}
}
Output of the above when removing the last node:
Found header: First Header
Found header: Second Header
Siblings of First Header:
Node: h3 = First Header
Node: p = Hello
Siblings of Second Header:
Node: h3 = Second Header
Output when changing the RemoveAt to headers.RemoveAt(1);
Found header: First Header
Found header: Third Header
Siblings of First Header:
Node: h3 = First Header
Node: h3 = Third Header
Node: p = Nonsense
Siblings of Third Header:
Node: h3 = Third Header
Node: p = Nonsense
Further technical details
- HAP version: Whichever version dotnetfiddle uses, found in 1.8.10
- NET version net472
Hello @adeyblue ,
My developer took time to look at it and we recommend you to use instead the RemoveChild
method such as:
var headerToRemove = headers.Last();
headerToRemove.ParentNode.RemoveChild(headerToRemove);
The problem with directly using RemoveAt
is you use the method from the List<T>
which doesn't raise the HasChanges
method. To make it works, we would need to create our own List<T>
class which I don't think is a good long-term solution.
Make sure to use methods provided by the library instead.
Let me know if that answer correctly to this issue
Best Regards,
Jon