Ambiguity in XPath EBNF - Lookup with TypeQualifier vs DynamicFunctionCall
Additional to https://github.com/qt4cg/qtspecs/issues/1050 An additional ambiguity occurs in one of the deep lookup examples:
$tree ??$from ??type(record(to, distance))[?to=$to] ?distance
which can be simplified to
$tree ??type(foo)
where there is ambiguity between a LookupExpr with TypeQualifier and a DynamicFunctionCall on a function named type. That is, type should perhaps be one of the restrictions on function name to avoid this ambiguity.
Whether something more fundamental is needed on the productions around [74],[75] and [84]-[88] I'm not sure, but certainly type can appear either as a keyword for TypeQualifier (consuming the bracketed type) or a value of an NCName (with the bracketed section being a higher-level PositionalArgumentList), both being part of a KeySpecifier.
As noted in the meeting, the type and record keywords should be added to the reserved function name list.
I don't think it's quite as simple as that. Taking the simple example:
$tree ??type(foo)
and a reduced-tree grammar, we get two parses:
<XPath xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
<ValueExpr>
<LookupExpr>
<VarRef>
<QName local="tree"/>
</VarRef>
<Lookup>??<TypeQualifier>
<SequenceType>
<QName local="foo"/>
</SequenceType>
</TypeQualifier>
</Lookup>
</LookupExpr>
</ValueExpr>
</XPath>
which is probably the intended meaning, and
<XPath xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
<ValueExpr>
<DynamicFunctionCall>
<LookupExpr>
<VarRef>
<QName local="tree"/>
</VarRef>
<Lookup>??<QName local="type"/>
</Lookup>
</LookupExpr>
<PositionalArgumentList>
<ValueExpr>
<AxisStep>
<QName local="foo"/>
<PredicateList/>
</AxisStep>
</ValueExpr>
</PositionalArgumentList>
</DynamicFunctionCall>
</ValueExpr>
</XPath>
where the bracketed section is taken as arguments to a dynamic function call via a LookupExpr, the lookup key of which happened to be type, and similarly for record. (If we change the expression to $tree ??Type(foo), the ambiguity disappears of course and it parses as a DynamicFunctionCall.)
In this case it isn't a function call that would have to disallow type, but the conjunction of a Lookup with key type within a DynamicFunctionCall. The alternative would be to forbid type or record being used as lookup keys, which I don't think would go down very well.
Right, so the actual ambiguity is in LookupExpr parsing a TypeQualifier or NCName:
$tree??type(foo)
^^^^^^^^^ -- KeySpecifier / TypeQualifier
^^^^^^^^^^^ -- Lookup
^^^^^^^^^^^^^^^^ -- LookupExpr
$tree??type(foo)
^^^^ -- KeySpecifier / NCName
^^^^^^ -- Lookup
^^^^^^^^^^^ -- PostfixExpr / LookupExpr
^^^^^^^^^^^^^^^^ -- DynamicFunctionCall
Does moving TypeQualifier to the start of the KeySpecifier list resolve this? I.e.
[88] KeySpecifier ::= TypeQualifier | NCName | IntegerLiteral | StringLiteral | VarRef | ParenthesizedExpr | LookupWildcard
And do we need to do the same with PostfixExpr for DynamicFunctionCall (moving it to the end):
[75] PostfixExpr ::= PrimaryExpr | FilterExpr | LookupExpr | FilterExprAM | DynamicFunctionCall
Does moving TypeQualifier to the start of the KeySpecifier list resolve this?
I don't think so, as the order of alternatives in EBNF doesn't, I think, constitute a priority order. (Please correct me if I'm wrong, but I can't see anything that suggests that)
I'm not sure as I've only hand-written the XPath/XQuery parser, so would handle this case by maximally matching the TypeQualifier i.e. by placing it before testing for an NCName.
If maximal matching on TypeQualifier is required, it needs to be stated in the extra-grammatical notes of the spec. (Such is already done for occurrence indicators on function types.) If this is the case then I assume:
($tree??type)(foo)
would be required and allow dynamic lookup and evaluation of a function via a type keyspecifier.
I would have expected the rule "Postfix expressions are evaluated from left-to-right." (in 4.3) to cover this. But actually, that would encourage the ($tree??type)(foo) interpretation. And I agree, while we can make type a reserved function name easily enough, we can't make it a reserved key in a map. So perhaps we do need to find a different syntax.
I'm not sure this is just an editorial fix - there is a fundamental ambiguity...
Indeed so, it needs a rethink of the syntax.
I'm inclined to dredge up an old idea: ~ as a type testing operator.
In a lookup expression, we write ??~record(from, to) as a shorthand for ??*[. instance of record(from, to)] (except it's not an exact equivalent because it doesn't do flattening before type testing).
More generally we can use ~ to filter by type, so $data ~ xs:integer means $data[ . instance of xs:integer ]. This is probably especially useful in XSLT patterns.
The phonetic resemblance of "type" and "tilde" (as with "at" and "attribute") has some mnemonic value.
Do I understand correctly that TypeQualifier is the fly in this ointment?
I find the tilde as a substitute more confusing and obfuscating.
TypeQualifier is a kind of filter/predicate construction. So why not just drop it and nudge the reader toward a predicate expression, e.g., $tree ??*[. instance of foo]? I find that formulation to more clearly declare the programmer's intent than $tree ??type(foo), and it doesn't cost too many extra characters.
I am concerned that we are introducing a keyword-like alternative to a perfectly clear and stable keyword, instance of, and doing so has introduced the ambiguity @johnlumley has brilliantly spotted.
The problem with using a predicate is that it qualifies the items selected by the expression, by which time information has already been lost - for example, members of an array that are empty sequences will have been reduced to nothing.
It's probably true that there is always an equivalent expression (something like $V?pairs::*[?value instance of T].) but that gets awfully cumbersome. A great strenth of axis steps is that the simple predicates - selecting children by name - can be expressed very concisely; you wouldn't to write every child step as child::*[node-name() eq QName("", "para")].
Could we define a type::xyz style axis and selector then? E.g. $tree??type::foo. -- That's clear enough and should not be ambiguous. It also generalizes to other axes as well such as $tree??attribute::name.