cssselect
cssselect copied to clipboard
[feature-request] `:not()` to support generic selectors (not only "simple" ones)
The document (version 0.9.1) says:
:not()
accepts a sequence of simple selectors, not just single simple selector. For example,:not(a.important[rel])
is allowed, even though the negation contains 3 simple selectors.
May I ask what is a simple selector? Can :not()
support something like :not(a>b)
?
>>> import cssselect
>>> cssselect.parse('a:not(p>a)')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 355, in parse
return list(parse_selector_group(stream))
File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 370, in parse_selector_group
yield Selector(*parse_selector(stream))
File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 378, in parse_selector
result, pseudo_element = parse_simple_selector(stream)
File "/usr/lib/python3.4/site-packages/cssselect/parser.py", line 471, in parse_simple_selector
raise SelectorSyntaxError("Expected ')', got %s" % (next,))
cssselect.parser.SelectorSyntaxError: Expected ')', got <DELIM '>' at 7>
Simple selector is defined at https://drafts.csswg.org/selectors-3/#simple-selectors-dfn . It does not include a > b
.
This restriction of :not()
is as defined in the Selectors Level 3 specification. This matches what most browsers implement in CSS today.
Selectors Level 4 lifts that restriction so it would be fine to implement in cssselect. However, I don’t know how translating this into XPath would work. If you figure it out, I’d review a pull request.
As :not()
takes a selector list and not just two simple selectors with a combinator, it looks like for the general support this needs some kind of reversing the selection, which XPath 1.0 doesn't have. So it looks like the only way to support a certain expression is to translate it manually into XPath and then implement that translation. So #124 should add support for things like :not(a > b)
, mentioned in this issue, but if one wants more complicated things (e.g. :not(a > b > c, d > e)
) it will need additional code, specific for some of these things. It's also not very intuitive at least for me, because you need to also invert the order: while a > b
translates to //a/b
, :not(a > b)
translates to something like //b[..[name() != 'a']]
and I'm not sure how complicated it would be for more complicated selectors or even selector lists inside :not()
, and I don't know which complex use cases are actually useful to also implement them. So for the scope of #124 I think we need something like this:
-
:not(a > b)
- select anything that is not (b with a parent a) ≡ select anything that is not b or doesn't have a parent a, so//*[not([a] and parent::*[b])]
-
:not(a b)
- select anything that is not (b with an ancestor a) ≡ select anything that is not b or doesn't have a ancestor a, so//*[not([a] and ancestor::*[b])]
-
:not(a + b)
- select anything that is not (b with an immediate sibling a) ≡ select anything that is not b or doesn't have an immediate sibling a, so//*[not([a] and following-sibling::*[position()=1 and b])]
-
:not(a ~ b)
- select anything that is not (b with a sibling a) ≡ select anything that is not b or doesn't have a sibling a, so//*[not([a] and following-sibling::*[b])]
where a and b are whatever we already support as conditions, I think. Disclaimer: I may be wrong with these expressions.