html-agility-pack icon indicating copy to clipboard operation
html-agility-pack copied to clipboard

SelectNodes ignores 'empty' tags

Open ganr8790 opened this issue 7 years ago • 6 comments

Not sure if I'm doing something wrong, but when I use SelectNodes with a fairly simple XPath, Agility 'ignores' tags that don't contain text (InnerText): ...SelectNodes("//div/ul/li/span") <span ...> <<<--- This one is being ignored <span ...>This is a test <<<--- This one is fine

FYI - The spans always appear in pairs, but I need to check a class name in the first (textless) one (...and yes, there's a workaround, but it is a bit cumbersome...)

ganr8790 avatar Aug 23 '17 21:08 ganr8790

Hello @ganr8790 ,

Could you provide me an example that's not working?

The following one return 5 nodes for me

var html = @"
<div>
	<ul>
		<li><span></li>
		<li><span /></li>
		<li><span></span></li>
		<li><span>a</li>
		<li><span>b</span></li>
	</ul>
</div>
";

var doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(html);

var nodes = doc.DocumentNode.SelectNodes("//div/ul/li/span")
    .ToList();

Best Regards,

Jonathan

JonathanMagnan avatar Aug 23 '17 21:08 JonathanMagnan

Hi Jon

The HTML is slightly different…

  • Text Here… Text Here… Text Here…

I was expecting a list that returns the outer Spans, but I only get the inner ones (which I thought will only be the case if the xpath I use is: "//div/ul/li/span/span")

My workaround is to search for the

  • tags, iterate through the child’s nodes and get their Inner span’s text and Outer HTML where

    I look for the value ‘myclass…’

    Regards

    Rami

    From: Jonathan Magnan [mailto:[email protected]] Sent: 23 August 2017 22:46 To: zzzprojects/html-agility-pack [email protected] Cc: VaderUK [email protected]; Mention [email protected] Subject: Re: [zzzprojects/html-agility-pack] SelectNodes ignores 'empty' tags (#68)

    Hello @ganr8790 https://github.com/ganr8790 ,

    Could you provide me an example that's not working?

    The following one return 5 nodes for me

    var html = @"

    • a
    • b
    ";

    var doc = new HtmlAgilityPack.HtmlDocument();

    doc.LoadHtml(html);

    var nodes = doc.DocumentNode.SelectNodes("//div/ul/li/span") .ToList();

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/68#issuecomment-324471730 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZrIUQh6pSBWuLEOVQRvjd9dbHxDcUUmks5sbJ2pgaJpZM4PAmGq . https://github.com/notifications/beacon/AZrIUXiqQzJvXTI0GmDDprXTEb6p_Tvkks5sbJ2pgaJpZM4PAmGq.gif

  • ganr8790 avatar Aug 23 '17 23:08 ganr8790

    Which version are you using?

    The code return me the outer span without problem with the innerHTML <span>Text Here…</span> and the right class name.

    Best Regards,

    Jonathan

    JonathanMagnan avatar Aug 23 '17 23:08 JonathanMagnan

    v1.5.1 – 6 Jul 2017 (I’m using VS2017 with the latest update)

    Thanks – I’m off…

    From: Jonathan Magnan [mailto:[email protected]] Sent: 24 August 2017 00:09 To: zzzprojects/html-agility-pack [email protected] Cc: VaderUK [email protected]; Mention [email protected] Subject: Re: [zzzprojects/html-agility-pack] SelectNodes ignores 'empty' tags (#68)

    Which version are you using?

    The code return me the outer span without problem with the innerHTML Text Here… and the right class name.

    Best Regards,

    Jonathan

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/68#issuecomment-324487724 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZrIUYv4_clpvjNaaw1URDBoqgvDfRNpks5sbLElgaJpZM4PAmGq . https://github.com/notifications/beacon/AZrIUVcHem5a7Eyem0bfG9nyFYP0pOfrks5sbLElgaJpZM4PAmGq.gif

    ganr8790 avatar Aug 23 '17 23:08 ganr8790

    Sorry, it was late – the xpath was actually: “div/ul/li/span”

    (the // causes the search to start from the html’s root

    irrespective of the current node)

    BTW – note that the below is an extract of a large HTML!

    .

    .

    .

        <ul>
    
               <li>
    
                       <span class=”myclass…”>
    
                               <span>Text Here…</span>
    
                       </span>
    
                       <span class=”myclass…”>
    
                               <span>Text Here…</span>
    
                       </span>
    
                       <span class=”myclass…”>
    
                               <span>Text Here…</span>
    
                       </span>                
    
               </li>
    
        </ul>
    

    .

    .

    .

    Maybe this will give you a clue - when I look at the Watch of the

    InnerText of the above xpath query I can see all the ‘Text Here’

    without any spaces!

    From: Jonathan Magnan [mailto:[email protected]] Sent: 24 August 2017 00:09 To: zzzprojects/html-agility-pack [email protected] Cc: VaderUK [email protected]; Mention [email protected] Subject: Re: [zzzprojects/html-agility-pack] SelectNodes ignores 'empty' tags (#68)

    Which version are you using?

    The code return me the outer span without problem with the innerHTML Text Here… and the right class name.

    Best Regards,

    Jonathan

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zzzprojects/html-agility-pack/issues/68#issuecomment-324487724 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZrIUYv4_clpvjNaaw1URDBoqgvDfRNpks5sbLElgaJpZM4PAmGq .

    ganr8790 avatar Aug 24 '17 11:08 ganr8790

    Hello @ganr8790 ,

    The InnerText only show the inner text (the text, not the HTML tag) which is Text Here… The InnerHtml show the inner HTML (All the HTML) which is <span>Text Here...</span>

    Perhaps it's only a misunderstanding between both properties?

    Best Regards,

    Jonathan

    JonathanMagnan avatar Aug 24 '17 12:08 JonathanMagnan