parsel XPath query is buggy

XPath query is buggy

Open shner-elmo opened this issue 1 year ago • 3 comments

Hey so I'm trying to locate a table inside the HTML using an XPath, and its not working well, when I select the first element [1] it returns a list of two elements instead of just one (I tested it on chrome and it works correctly there).

This is the code that I used to initialize it:

import parsel

html = '....'
sel = parsel.Selector(html)

And the bug:

May 12 '24 12:05 shner-elmo

What you do on Chrome does not matter, because Chrome does not work on the raw HTML response, but on the DOM.

I bet there are 2 tables that are the first element of their parent. (//table)[1] probably does what you want.

May 13 '24 10:05 Gallaecio

What you do on Chrome does not matter, because Chrome does not work on the raw HTML response, but on the DOM.

I don't understand, how is the DOM different from the HTML? because maybe some JS modified it?

If that's the case it's the same thing because the HTML that I opened in Chrome was a local file (file://...) that I saved from a website.

May 22 '24 14:05 shner-elmo

@shner-elmo there are some more caveats, even unrelated to JS; see https://docs.scrapy.org/en/latest/topics/developer-tools.html#caveats-with-inspecting-the-live-browser-dom

May 22 '24 19:05 kmike

parsel parsel copied to clipboard

XPath query is buggy

parsel
parsel copied to clipboard