cssselect icon indicating copy to clipboard operation
cssselect copied to clipboard

*:first-of-type and friends are not implemented yet

Open SimonSapin opened this issue 13 years ago • 8 comments

From the docs:

*:first-of-type, *:last-of-type, *:nth-of-type, *:nth-last-of-type, *:only-of-type. All of these work when you specify an element type, but not with *

SimonSapin avatar Apr 18 '12 16:04 SimonSapin

Actually, the current implementation is broken. The selector e ~ f:nth-child(3) is translated to XPath e/following-sibling::*[name() = 'f' and (position() = 3)] which is incorrect: it finds the 3rd element after e, not the third child of its parent.

SimonSapin avatar Jun 13 '12 15:06 SimonSapin

Hi Simon,

Do you know what the status of this is? Would these be easy to implement?

I wanted to use <element>:nth-of-type.

beaumartinez avatar Dec 04 '14 21:12 beaumartinez

I don’t know how easy it is. What’s needed it to find what the correct XPath translation is, if there is one in the general case.

SimonSapin avatar Dec 05 '14 00:12 SimonSapin

I see. I found this Wikibook last night, I'm not sure how reliable it is though.

beaumartinez avatar Dec 05 '14 12:12 beaumartinez

//p[n] is a correct translation of p:nth-of-type(n) when it’s by itself, but not always when combined with other selectors. [n] in XPath indexes within the current scope, whereas the :nth* family of CSS pseudo-classes counts from the first child of the parent.

Example:

<div>
<p id="a"/><p id="b"/><p id="c"/><p id="d"/><p id="e"/>
</div>

In Selectors, #b ~ p:nth-of-type(3) would match #c, and #b ~ p:nth-of-type(2) would not match anything. (Counting from the first child of the <div>). In XPath, //[@id="b"]/following-sibling::p[3] would match #e and //[@id="b"]/following-sibling::p[2] would match #d, counting from the "current position".

I’m not convinced there even is a correct XPath translation of some Selectors.

This kind of thing has lead me to believe that the entire premise of translating Selectors to XPath (or at least to XPath 1.0, what’s implemented in libxml,) is flawed.

I’ve started work on cssselect2 which implements Selectors “for real” without XPath being involved, but it’s blocked on some design decisions that need to be made: https://github.com/SimonSapin/cssselect2/issues/1

SimonSapin avatar Dec 05 '14 15:12 SimonSapin

Thanks for the comprehensive answer @SimonSapin ! It's truly a shame if XPath 1.0 isn't flexible enough.

I'm keen to see how cssselect2 develops.

beaumartinez avatar Dec 05 '14 15:12 beaumartinez

For the record, here's a tentative implementation using an XPath extension function with lxml: https://github.com/scrapy/parsel/pull/73

redapple avatar Mar 30 '17 14:03 redapple

cssselect contributors, do you have any advice on this? There seem two solutions, cssselect2 and scrapy/parsel ... are any of these solutions mature enough ? Do i still need the cssselect package?

flip111 avatar Jul 17 '18 11:07 flip111