tree-sitter-css
tree-sitter-css copied to clipboard
bug: Some selectors in `:has` are treated as plain values
Did you check existing issues?
- [X] I have read all the tree-sitter docs if it relates to using the parser
- [X] I have searched the existing issues of tree-sitter-css
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)
No response
Describe the bug
Only certain kinds of selectors fail to be parsed within a :has — class_selector and id_selector when they have tag names.
Steps To Reproduce/Bad Parse Tree
This parses correctly:
div.myclass:has(li) {}
(stylesheet [0, 0] - [1, 0]
(rule_set [0, 0] - [0, 22]
(selectors [0, 0] - [0, 19]
(pseudo_class_selector [0, 0] - [0, 19]
(class_selector [0, 0] - [0, 11]
(tag_name [0, 0] - [0, 3])
(class_name [0, 4] - [0, 11]))
(class_name [0, 12] - [0, 15])
(arguments [0, 15] - [0, 19]
(tag_name [0, 16] - [0, 18]))))
(block [0, 20] - [0, 22])))
This does not:
div.myclass:has(li.foo) {}
(stylesheet [0, 0] - [0, 30]
(rule_set [0, 0] - [0, 30]
(selectors [0, 0] - [0, 27]
(pseudo_class_selector [0, 0] - [0, 27]
(class_selector [0, 0] - [0, 11]
(tag_name [0, 0] - [0, 3])
(class_name [0, 4] - [0, 11]))
(class_name [0, 12] - [0, 15])
(arguments [0, 15] - [0, 27]
(plain_value [0, 16] - [0, 26]))))
(block [0, 28] - [0, 30])))
Here are some other examples that parse exactly as expected:
div.myclass:has(#foo) {}
div.myclass:has(.bar) {}
div.myclass:has(foo[bar]) {}
div.myclass:has(li ~ p) {}
div.myclass:has(li p) {}
div.myclass:has(p li.foo) {} /* (weirdly enough) */
And here are some which are interpreted as plain_value:
div.myclass:has(li#foo) {}
div.myclass:has(li.foo) {}
div.myclass:has(li.foo p) {}
div.myclass:has(p.bar li.foo) {}
Expected Behavior/Parse Tree
In each of these cases, the plain_value should instead be a selectors node. :has can accept selectors of arbitrary complexity, much like :not.
Repro
No response
So I think I understand the problem:
- When the parser has just consumed the opening
(, it seesli.fooahead of it - It can interpret that as a series of selector-related tokens, or it can parse it as a
plain_value - It chooses
plain_valuebecause that gives it the longest possible match - The other examples that are parsed correctly don't have this problem because none of them are valid
plain_values… - But
li#fooandli.fooboth are, because a plain value can be a URL, and both.and#are characters that occur in URLs
So this is a lexical precedence issue. I can think of a few solutions:
- Define a different version of
plain_valuethat excludes URLs (something likeplain_value_without_url, but aliased toplain_value), then a different version of_valuethat listsplain_value_without_urlinstead ofplain_valueamong its options, and then changepseudo_class_argumentsto choose between_selectorand my_value_without_url - Invert the problem by being more strict about where URLs are allowed as plain values: only inside
urlfunctions (which is how I fixed a similar problem in mytree-sitter-cssfork). Henceplain_valueexcludes URLs by default, and only in one specific usage do you needplain_value_with_urlinstead - Put an external scanner in charge of parsing URLs (but something like
li#foomight actually be a valid URL in some strange context; not sure)
But the simplest thing I can think of — use prec to encourage the parser to favor _selector over plain_value — is the one I just can't get working.
I could demote plain_value to a lower precedence, and this solves my problem…
plain_value: _ => token(prec(-1, seq(
repeat(choice(
/[-_]/,
/\/[^\*\s,;!{}()\[\]]/, // Slash not followed by a '*' (which would be a comment)
)),
/[a-zA-Z]/,
repeat(choice(
/[^/\s,;!{}()\[\]]/, // Not a slash, not a delimiter character
/\/[^\*\s,;!{}()\[\]]/, // Slash not followed by a '*' (which would be a comment)
)),
))),
…but breaks three other tests. I'd much rather boost the precedence of _selectors, but I can't seem to get that to have any effect.
I think I'm pretty close on this one and just need a nudge to find the right answer.