kdl icon indicating copy to clipboard operation
kdl copied to clipboard

Incompatibilities KDL 2 <-> Query

Open bgotink opened this issue 11 months ago • 2 comments

I'm trying to write a parser for KQL so I'm taking a look at the query spec, and I've noticed two issues:

lorem+ipsum is a valid identifier but + is an operator and the query spec doesn't require space around operators. Should this be interpreted as "look for a node named lorem+ipsum"? Or does this instead mean "look for a node named ipsum immediately following a node named lorem"? Idem for (lorem)+[ipsum], does that look for a node named + with type (lorem) and a property ipsum? Or does this look for nodes with ipsum property directly following nodes with type (lorem)?

The grammar lists types of accessors that are not described anywhere else in the document: "values(" q-ws* ")" | "props(" q-ws* ")". How are these supposed to behave?

There's one other thing that purely comes down to confusion on my end, but with >, <, and || being valid identifiers, the following queries are all valid:

query description
>> >> >> node named >> that's a descendant of a node named >>
> > > node named > whose parent is also called >
|| || || A node named || or a node named ||
[val() <<] A node whose first value is a string with a value that comes before < when sorting alphabetically

In my opinion it would make sense to disallow identifiers that are also operators in the query spec, just like the KDL spec itself doesn't allow identifiers that could cause confusion with any of the keywords or with numbers.

bgotink avatar Jan 19 '25 00:01 bgotink

uggghhh yes I thought I'd made them q-ws+ already. It's very much written with the intention that you can use whitespace to disambiguate.

And yes, as you say, >> >> >> is perfectly valid and I'm ok with that. Ditto > > > and || || ||. Corner cases, but unambiguous ones (this is why I made >> the descendant operator, instead of whitespace as in CSS).

Probably all the current q-ws* can be replaced with q-ws+ except for the ones in the following rules:

filter := "top(" q-ws* ")" | matchers
type-matcher := "(" q-ws* ")" | $type
accessor-matcher := "[" q-ws* (comparison | accessor)? q-ws* "]"

This one is particularly of note:

comparison := accessor q-ws* matcher-operator q-ws* ($type | $string | $number | $keyword)

You mentioned the [val()>>] case being weird, but this one's even worse: [foo>1], which is intended to mean [prop(foo) > 1], but is actually [prop("foo>1")] ("select any element with a prop called foo>1").

So while that means [val()=1] is no longer legal, it gets rid of that weird ambiguity. That said, it's kind of a footgun: I think for stuff like comparisons, a LOT of people will definitely assume they can omit spaces.

We're gonna have to seriously revisit this. I think it's safe to say that reserving those comparison characters isn't part of the possible design space at this point. But that means we can't do the equivalent of the css div[foo^=bar] and such. At least all the foo() accessors are "safe" on the left hand side. The right hand side continues to be a concern, though, for strings.

zkat avatar Jan 19 '25 01:01 zkat

Based on my (very limited) experience writing & testing a KPath parser, I'd like to suggest a few changes:

  • As you already mentioned, make whitespace required around all operators. That solves all of the issues until we can find a better solution (if we can find a better solution), one that could allow [foo^=bar] to exist.
  • val() is mentioned a few times in the document but it isn't allowed per the grammar. Either the text or the grammar have to be updated.
  • There's a difference in terminology between KDL and KPath: what KDL calls "arguments" are called "values" in the KPath spec. As KDL is more stable, I suggest modifying KPath to use argument and arg(0)

Then there are a few gaps:

  • Most notably it's impossible to query a negative right now: [prop != lorem] checks whether property prop is present and not equal to "lorem". That's not the inverse of [prop = lorem] which checks whether the property prop is present and equal to "lorem".
    • In theory we could implement this inverse by changing the definition of != to also match properties/values that are not present, but that only solves the lack of a negative for the = operator. What about the opposite to [prop ^= prefix]? We could introduce !^= as operator but I personally think that's a bit much
    • My first thought is to borrow from CSS's syntax for case insensitive attribute selectors and go for [! prop = lorem]. The inverse of (type)node would then be [! type() = type] || [name() != node] (since a node name is always present, [! name() = node] is equivalent to [name() != node]).
  • CSS has the useful :is()/:where() pseudoselectors which allow grouping alternatives together. That would be useful to have, e.g. a >> {b || c} >> {d || e}
    • We could add a negative variant, e.g. a >> !{b || c} > d as well, like CSS's :not()

bgotink avatar Jan 22 '25 20:01 bgotink