kdl icon indicating copy to clipboard operation
kdl copied to clipboard

Ambiguity in KQL

Open larsgw opened this issue 3 years ago • 14 comments

I accidentally introduced some level of ambiguity in KQL by including the tags. Assuming we are allowing more than one matcher for each node filter, top() can now also mean "a node named 'top' with any tag". We could say that top() always means top() and the other meaning should be represented by ()top or ()[name() = "top"].

larsgw avatar Sep 18 '21 13:09 larsgw

Or just outright fix the order of matchers in a node filter to something like this

node-filter := top | tag? node matcher* | tag? matcher+ | tag

i.e. either top(), or zero or one tags, followed by zero or one node names, followed by zero or more matchers (but combined at least one).

larsgw avatar Sep 18 '21 15:09 larsgw

Oh oops. Yeah I would only expect those to be prefixes.

zkat avatar Sep 18 '21 16:09 zkat

It might not be a problem with the spec, more my interpretation of it I guess (since there isn't a grammar yet).

larsgw avatar Sep 18 '21 16:09 larsgw

Sounds like it's about time we wrote an actual grammar for it haha

zkat avatar Sep 18 '21 21:09 zkat

I was playing around with making my own impl of kql and I got this far with a grammar for it. I think I'm missing some important context that's not in the spec yet though.

statement := or_expr ('=>' mapping)

or_expr := s_expr ( '||' s_expr )*

s_expr :=
	s_term |
	selector s_op s_term |
	selector s_term

s_term :=
	'top()' |
	identifier |
	'(' s_expr ')' |
	'[' m_expr ']'

s_op := '>' | '~' | '+'

m_expr :=
	m_term m_op m_cmp |
	m_term

m_term :=
	identifier |
	'prop(' identifier ')' |
	'val(' digit* ')' |
	'name()' |
	'tag()'

m_cmp := number | string | '(' identifier ')'

m_op := '=' | '!=' | '>=' | '>' |  '<=' | '<' | '^=' | '$=' | '*='

mapping := accessor | '(' (accessor ','?)+ ')'

accessor :=
	'name()' |
	'val(' digit* ')' |
	'prop(' identifier ')' |
	identifier |
	'props()' |
	'values()'

(unexplained terms are all from the KDL spec, and I haven't rewritten this yet to avoid left recursion)

I think having an explicit concept of functions would be better for error messages (like if you misspell or provide the wrong argument type), so maybe foo () should always be interpreted as a function instead of a node foo with any tag? Since if you wanted to you could still write top[()] or prop[(foo)] to get the other behavior I think.

tiffany352 avatar Sep 18 '21 21:09 tiffany352

s_expr :=
  s_term |
  selector s_op s_term |
  selector s_term

What's selector here, s_expr?

s_term :=
  'top()' |
  identifier |
  '(' s_expr ')' |
  '[' m_expr ']'

'(' s_expr ')' should be '(' identifier ')'. I realise now that (foo) in the spec may be a bad example, but that syntax is for Type Annotation (i.e. tags), specifically of nodes.

accessor :=

tag() should also be an accessor (I think that is okay anyway) but I forgot to add it to the spec at the time.

larsgw avatar Sep 18 '21 23:09 larsgw

I have a prototype implementation (lacking sibling selectors for now, and tags since my KDL parser does not support them yet): https://github.com/kdl-org/kdljs/commit/ec79f789211df5d25e87b15671c390baae6e928a. I ended up just giving "top()" priority over "top", "()" by tokenizing as "top(", ")" for now.

larsgw avatar Sep 19 '21 00:09 larsgw

I think top() is a bit confusing anyway, is this interpretation right?

  • top() and top() > [] select the top-level nodes (this part it says literally)
  • top() [] selects all nodes (like the query [] does)
  • top() + [] and top() ~ [] probably mean top() > [] + [] and top() > [] ~ [] respectively I guess?

larsgw avatar Sep 19 '21 00:09 larsgw

I mean yes, it's weird, but I had a hard time coming up with something that would let you say "start at the top".

Like, would it be better to have a unary > foo to be very specific about "I want the toplevel children with node name foo"?

zkat avatar Sep 19 '21 01:09 zkat

Probably the easiest way to avoid name collisions between protocol syntax and user space at this stage of development would be to adopt a prefix from among those already not allowed to begin or precede an accessor, e.g. /virtual_accessor(). I'm casually suggesting the "/" as this would put comments into the meta space, but any character or pair currently reserved that doesn't cause a collision with other syntax would do just as well. (The example and the specific suggestion make unsafe assumptions; the main point is to create a reserved namespace, preferably less verbose than "__KDL:" or such

I'm suggesting a simple prefix because I think that avoiding backtracking and minimizing lookahead in the parser are reasonable goals

swift2plunder avatar Sep 19 '21 08:09 swift2plunder

Like, would it be better to have a unary > foo to be very specific about "I want the toplevel children with node name foo"?

I personally do like that syntax (especially over top() > foo).


How's this? (still with top() and ignoring whitespace for the moment)

statement := alternatives ( '=>' mapping )?

alternatives := selector ( '||' alternatives )*

selector :=
	node_filter combinator selector |
	node_filter selector |
	node_filter

combinator := '>' | '~' | '+'

node_filter :=
	'top()' |
	( '(' identifier? ') ')? identifier ( '[' matcher ']' )* |
	'(' identifier? ')' ( '[' matcher ']' )* |
        ( '[' matcher ']' )+	

matcher :=
	matcher_accessor matcher_operator matcher_comparison |
	matcher_accessor

matcher_accessor :=
	identifier |
	'prop(' identifier ')' |
	'val(' digit* ')' |
	'name()' |
	'tag()'

matcher_comparison := number | string | '(' identifier ')'

matcher_operator := '=' | '!=' | '>=' | '>' |  '<=' | '<' | '^=' | '$=' | '*='

mapping := accessor | '(' mapping_tuple ')'

mapping_tuple :=
	accessor ',' mapping_tupple |
	accessor

accessor :=
	matcher_accessor |
	'props()' |
	'values()'

larsgw avatar Sep 19 '21 12:09 larsgw

I just realized that + and ~ are both valid KDL identifier characters. sigh

Let's PR this and figure out a way around this, tho :)

zkat avatar Sep 21 '21 02:09 zkat

I just realized that + and ~ are both valid KDL identifier characters. sigh

Is requiring whitespace around combinators (and other operators) and then just saying that you need to use "+"/"!" resp. good enough you think?

larsgw avatar Sep 21 '21 20:09 larsgw

Not sure if I should open a PR for this, but another thing I'm confused about: According to the spec, val() != x only checks nodes that actually have a value. i.e. you can never use != to fetch nodes with no values. Does this also mean that val(n) != x should also skip nodes that don't have an nth value? Similarly, does this mean that prop(foo) != x should also skip nodes that don't have a foo property?

cc @fabiancook who I'm collaborating with to build a test suite for KQL implementations. So far there are a couple of inconsistencies: behaviour of non-top-level top(), and now the behaviour of != against non-existent props. I believe this has arisen from such ambiguity.

danini-the-panini avatar Mar 03 '22 12:03 danini-the-panini

I'm pretty sure all the issues outlined here are no longer a problem in the latest KQL spec, so I'm closing this (new KQL will be released as part of KDL v2)

zkat avatar Dec 13 '23 05:12 zkat