html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

SelectNodes().RemoveAt breaks node relationships

Open adeyblue opened this issue 4 years ago • 1 comments

Description

Removing elements from the results of SelectNodes causes siblings of nodes either side of the one removed to go missing or change which parent they belong to. Either that or I'm really misunderstanding something.

Fiddle or Project

// @nuget: HtmlAgilityPack

using System;
using HtmlAgilityPack;
					
public class Program
{
	public static void Main()
	{
		var html = 
		@"<html>
  <head>
    <title>Document</title>
  </head>
  <body>
    <div class=""divClass"">
	  <h3 class=""h3Class"">First Header</h3>
	    <p class=""pClass"">
           Hello
	    </p>
    </div>
	<div class=""divClass"">
	  <h3 class=""h3Class"">Second Header</h3>
	  <p class=""pClass"">
         World!
	  </p>
    </div>
	<div class=""divClass"">
	  <h3 class=""h3Class"">Third Header</h3>
	  <p class=""pClass"">
         Nonsense
	  </p>
    </div>
  </body>
</html>";

		var htmlDoc = new HtmlDocument();
		htmlDoc.LoadHtml(html);
		HtmlNode root = htmlDoc.DocumentNode;
		HtmlNodeCollection headers = root.SelectNodes("//h3[contains(@class, 'h3')]");
		// don't want the last one
		headers.RemoveAt(headers.Count - 1); // without this line, it does what I expect. Both h3's and p's are displayed
		foreach(HtmlNode node in headers)
		{
			Console.WriteLine("Found header: {0}", node.InnerText);
		}
		Console.WriteLine();
		DisplayAllSiblings(headers[0]); // 'p = Hello' should be displayed
		DisplayAllSiblings(headers[1]); // 'p = World!' has gone missing
	}
	
	static void DisplayAllSiblings(HtmlNode node)
	{
		HtmlNode parent = node.ParentNode;
		HtmlNodeCollection coll = parent.SelectNodes("./*");
		
		Console.WriteLine("Siblings of {0}:", node.InnerText);
		foreach(HtmlNode brother in coll)
		{
			Console.WriteLine("Node: {0} = {1}", brother.Name, brother.InnerText.Trim());
		}
		Console.WriteLine();
	}
}

Output of the above when removing the last node:

Found header: First Header
Found header: Second Header

Siblings of First Header:
Node: h3 = First Header
Node: p = Hello

Siblings of Second Header:
Node: h3 = Second Header

Output when changing the RemoveAt to headers.RemoveAt(1);

Found header: First Header
Found header: Third Header

Siblings of First Header:
Node: h3 = First Header
Node: h3 = Third Header
Node: p = Nonsense

Siblings of Third Header:
Node: h3 = Third Header
Node: p = Nonsense

Further technical details

  • HAP version: Whichever version dotnetfiddle uses, found in 1.8.10
  • NET version net472

adeyblue avatar Dec 16 '20 03:12 adeyblue

Hello @adeyblue ,

My developer took time to look at it and we recommend you to use instead the RemoveChild method such as:

var headerToRemove = headers.Last();
headerToRemove.ParentNode.RemoveChild(headerToRemove);

The problem with directly using RemoveAt is you use the method from the List<T> which doesn't raise the HasChanges method. To make it works, we would need to create our own List<T> class which I don't think is a good long-term solution.

Make sure to use methods provided by the library instead.

Let me know if that answer correctly to this issue

Best Regards,

Jon

JonathanMagnan avatar Dec 23 '20 00:12 JonathanMagnan