kdl
kdl copied to clipboard
Ambiguity in KQL
I accidentally introduced some level of ambiguity in KQL by including the tags. Assuming we are allowing more than one matcher for each node filter, top()
can now also mean "a node named 'top' with any tag". We could say that top()
always means top()
and the other meaning should be represented by ()top
or ()[name() = "top"]
.
Or just outright fix the order of matchers in a node filter to something like this
node-filter := top | tag? node matcher* | tag? matcher+ | tag
i.e. either top()
, or zero or one tags, followed by zero or one node names, followed by zero or more matchers (but combined at least one).
Oh oops. Yeah I would only expect those to be prefixes.
It might not be a problem with the spec, more my interpretation of it I guess (since there isn't a grammar yet).
Sounds like it's about time we wrote an actual grammar for it haha
I was playing around with making my own impl of kql and I got this far with a grammar for it. I think I'm missing some important context that's not in the spec yet though.
statement := or_expr ('=>' mapping)
or_expr := s_expr ( '||' s_expr )*
s_expr :=
s_term |
selector s_op s_term |
selector s_term
s_term :=
'top()' |
identifier |
'(' s_expr ')' |
'[' m_expr ']'
s_op := '>' | '~' | '+'
m_expr :=
m_term m_op m_cmp |
m_term
m_term :=
identifier |
'prop(' identifier ')' |
'val(' digit* ')' |
'name()' |
'tag()'
m_cmp := number | string | '(' identifier ')'
m_op := '=' | '!=' | '>=' | '>' | '<=' | '<' | '^=' | '$=' | '*='
mapping := accessor | '(' (accessor ','?)+ ')'
accessor :=
'name()' |
'val(' digit* ')' |
'prop(' identifier ')' |
identifier |
'props()' |
'values()'
(unexplained terms are all from the KDL spec, and I haven't rewritten this yet to avoid left recursion)
I think having an explicit concept of functions would be better for error messages (like if you misspell or provide the wrong argument type), so maybe foo ()
should always be interpreted as a function instead of a node foo with any tag? Since if you wanted to you could still write top[()]
or prop[(foo)]
to get the other behavior I think.
s_expr := s_term | selector s_op s_term | selector s_term
What's selector
here, s_expr
?
s_term := 'top()' | identifier | '(' s_expr ')' | '[' m_expr ']'
'(' s_expr ')'
should be '(' identifier ')'
. I realise now that (foo)
in the spec may be a bad example, but that syntax is for Type Annotation (i.e. tags), specifically of nodes.
accessor :=
tag()
should also be an accessor (I think that is okay anyway) but I forgot to add it to the spec at the time.
I have a prototype implementation (lacking sibling selectors for now, and tags since my KDL parser does not support them yet): https://github.com/kdl-org/kdljs/commit/ec79f789211df5d25e87b15671c390baae6e928a. I ended up just giving "top()"
priority over "top", "()"
by tokenizing as "top(", ")"
for now.
I think top()
is a bit confusing anyway, is this interpretation right?
-
top()
andtop() > []
select the top-level nodes (this part it says literally) -
top() []
selects all nodes (like the query[]
does) -
top() + []
andtop() ~ []
probably meantop() > [] + []
andtop() > [] ~ []
respectively I guess?
I mean yes, it's weird, but I had a hard time coming up with something that would let you say "start at the top".
Like, would it be better to have a unary > foo
to be very specific about "I want the toplevel children with node name foo
"?
Probably the easiest way to avoid name collisions between protocol syntax and user space at this stage of development would be to adopt a prefix from among those already not allowed to begin or precede an accessor, e.g. /virtual_accessor(). I'm casually suggesting the "/" as this would put comments into the meta space, but any character or pair currently reserved that doesn't cause a collision with other syntax would do just as well. (The example and the specific suggestion make unsafe assumptions; the main point is to create a reserved namespace, preferably less verbose than "__KDL:" or such
I'm suggesting a simple prefix because I think that avoiding backtracking and minimizing lookahead in the parser are reasonable goals
Like, would it be better to have a unary
> foo
to be very specific about "I want the toplevel children with node namefoo
"?
I personally do like that syntax (especially over top() > foo
).
How's this? (still with top()
and ignoring whitespace for the moment)
statement := alternatives ( '=>' mapping )?
alternatives := selector ( '||' alternatives )*
selector :=
node_filter combinator selector |
node_filter selector |
node_filter
combinator := '>' | '~' | '+'
node_filter :=
'top()' |
( '(' identifier? ') ')? identifier ( '[' matcher ']' )* |
'(' identifier? ')' ( '[' matcher ']' )* |
( '[' matcher ']' )+
matcher :=
matcher_accessor matcher_operator matcher_comparison |
matcher_accessor
matcher_accessor :=
identifier |
'prop(' identifier ')' |
'val(' digit* ')' |
'name()' |
'tag()'
matcher_comparison := number | string | '(' identifier ')'
matcher_operator := '=' | '!=' | '>=' | '>' | '<=' | '<' | '^=' | '$=' | '*='
mapping := accessor | '(' mapping_tuple ')'
mapping_tuple :=
accessor ',' mapping_tupple |
accessor
accessor :=
matcher_accessor |
'props()' |
'values()'
I just realized that +
and ~
are both valid KDL identifier characters. sigh
Let's PR this and figure out a way around this, tho :)
I just realized that
+
and~
are both valid KDL identifier characters. sigh
Is requiring whitespace around combinators (and other operators) and then just saying that you need to use "+"
/"!"
resp. good enough you think?
Not sure if I should open a PR for this, but another thing I'm confused about:
According to the spec, val() != x
only checks nodes that actually have a value. i.e. you can never use !=
to fetch nodes with no values.
Does this also mean that val(n) != x
should also skip nodes that don't have an nth value?
Similarly, does this mean that prop(foo) != x
should also skip nodes that don't have a foo
property?
cc @fabiancook who I'm collaborating with to build a test suite for KQL implementations. So far there are a couple of inconsistencies: behaviour of non-top-level top()
, and now the behaviour of !=
against non-existent props. I believe this has arisen from such ambiguity.
I'm pretty sure all the issues outlined here are no longer a problem in the latest KQL spec, so I'm closing this (new KQL will be released as part of KDL v2)