qtspecs icon indicating copy to clipboard operation
qtspecs copied to clipboard

Ambiguity in XPath EBNF - Lookup with TypeQualifier vs DynamicFunctionCall

Open johnlumley opened this issue 1 year ago • 10 comments

Additional to https://github.com/qt4cg/qtspecs/issues/1050 An additional ambiguity occurs in one of the deep lookup examples:

$tree ??$from ??type(record(to, distance))[?to=$to] ?distance

which can be simplified to

$tree ??type(foo)

where there is ambiguity between a LookupExpr with TypeQualifier and a DynamicFunctionCall on a function named type. That is, type should perhaps be one of the restrictions on function name to avoid this ambiguity.

Whether something more fundamental is needed on the productions around [74],[75] and [84]-[88] I'm not sure, but certainly type can appear either as a keyword for TypeQualifier (consuming the bracketed type) or a value of an NCName (with the bracketed section being a higher-level PositionalArgumentList), both being part of a KeySpecifier.

johnlumley avatar Jul 01 '24 16:07 johnlumley

As noted in the meeting, the type and record keywords should be added to the reserved function name list.

rhdunn avatar Jul 02 '24 16:07 rhdunn

I don't think it's quite as simple as that. Taking the simple example:

$tree ??type(foo)

and a reduced-tree grammar, we get two parses:

<XPath xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
   <ValueExpr>
      <LookupExpr>
         <VarRef>
            <QName local="tree"/>
         </VarRef>
         <Lookup>??<TypeQualifier>
               <SequenceType>
                  <QName local="foo"/>
               </SequenceType>
            </TypeQualifier>
         </Lookup>
      </LookupExpr>
   </ValueExpr>
</XPath>

which is probably the intended meaning, and

<XPath xmlns:ixml="http://invisiblexml.org/NS" ixml:state="ambiguous">
   <ValueExpr>
      <DynamicFunctionCall>
         <LookupExpr>
            <VarRef>
               <QName local="tree"/>
            </VarRef>
            <Lookup>??<QName local="type"/>
            </Lookup>
         </LookupExpr>
         <PositionalArgumentList>
            <ValueExpr>
               <AxisStep>
                  <QName local="foo"/>
                  <PredicateList/>
               </AxisStep>
            </ValueExpr>
         </PositionalArgumentList>
      </DynamicFunctionCall>
   </ValueExpr>
</XPath>

where the bracketed section is taken as arguments to a dynamic function call via a LookupExpr, the lookup key of which happened to be type, and similarly for record. (If we change the expression to $tree ??Type(foo), the ambiguity disappears of course and it parses as a DynamicFunctionCall.)

In this case it isn't a function call that would have to disallow type, but the conjunction of a Lookup with key type within a DynamicFunctionCall. The alternative would be to forbid type or record being used as lookup keys, which I don't think would go down very well.

johnlumley avatar Jul 03 '24 09:07 johnlumley

Right, so the actual ambiguity is in LookupExpr parsing a TypeQualifier or NCName:

    $tree??type(foo)
           ^^^^^^^^^ -- KeySpecifier / TypeQualifier
         ^^^^^^^^^^^ -- Lookup
    ^^^^^^^^^^^^^^^^ -- LookupExpr

    $tree??type(foo)
           ^^^^      -- KeySpecifier / NCName
         ^^^^^^      -- Lookup
    ^^^^^^^^^^^      -- PostfixExpr / LookupExpr
    ^^^^^^^^^^^^^^^^ -- DynamicFunctionCall

Does moving TypeQualifier to the start of the KeySpecifier list resolve this? I.e.

[88]    	KeySpecifier 	   ::=    	TypeQualifier | NCName | IntegerLiteral | StringLiteral | VarRef | ParenthesizedExpr | LookupWildcard 	

And do we need to do the same with PostfixExpr for DynamicFunctionCall (moving it to the end):

[75]    	PostfixExpr 	   ::=    	PrimaryExpr | FilterExpr | LookupExpr | FilterExprAM | DynamicFunctionCall

rhdunn avatar Jul 03 '24 10:07 rhdunn

Does moving TypeQualifier to the start of the KeySpecifier list resolve this?

I don't think so, as the order of alternatives in EBNF doesn't, I think, constitute a priority order. (Please correct me if I'm wrong, but I can't see anything that suggests that)

johnlumley avatar Jul 03 '24 10:07 johnlumley

I'm not sure as I've only hand-written the XPath/XQuery parser, so would handle this case by maximally matching the TypeQualifier i.e. by placing it before testing for an NCName.

rhdunn avatar Jul 03 '24 11:07 rhdunn

If maximal matching on TypeQualifier is required, it needs to be stated in the extra-grammatical notes of the spec. (Such is already done for occurrence indicators on function types.) If this is the case then I assume:

($tree??type)(foo)

would be required and allow dynamic lookup and evaluation of a function via a type keyspecifier.

johnlumley avatar Jul 03 '24 12:07 johnlumley

I would have expected the rule "Postfix expressions are evaluated from left-to-right." (in 4.3) to cover this. But actually, that would encourage the ($tree??type)(foo) interpretation. And I agree, while we can make type a reserved function name easily enough, we can't make it a reserved key in a map. So perhaps we do need to find a different syntax.

michaelhkay avatar Jul 03 '24 12:07 michaelhkay

I'm not sure this is just an editorial fix - there is a fundamental ambiguity...

johnlumley avatar Jul 09 '24 09:07 johnlumley

Indeed so, it needs a rethink of the syntax.

michaelhkay avatar Jul 09 '24 14:07 michaelhkay

I'm inclined to dredge up an old idea: ~ as a type testing operator.

In a lookup expression, we write ??~record(from, to) as a shorthand for ??*[. instance of record(from, to)] (except it's not an exact equivalent because it doesn't do flattening before type testing).

More generally we can use ~ to filter by type, so $data ~ xs:integer means $data[ . instance of xs:integer ]. This is probably especially useful in XSLT patterns.

The phonetic resemblance of "type" and "tilde" (as with "at" and "attribute") has some mnemonic value.

michaelhkay avatar Aug 02 '24 09:08 michaelhkay

Do I understand correctly that TypeQualifier is the fly in this ointment?

I find the tilde as a substitute more confusing and obfuscating.

TypeQualifier is a kind of filter/predicate construction. So why not just drop it and nudge the reader toward a predicate expression, e.g., $tree ??*[. instance of foo]? I find that formulation to more clearly declare the programmer's intent than $tree ??type(foo), and it doesn't cost too many extra characters.

I am concerned that we are introducing a keyword-like alternative to a perfectly clear and stable keyword, instance of, and doing so has introduced the ambiguity @johnlumley has brilliantly spotted.

Arithmeticus avatar Sep 10 '24 01:09 Arithmeticus

The problem with using a predicate is that it qualifies the items selected by the expression, by which time information has already been lost - for example, members of an array that are empty sequences will have been reduced to nothing.

It's probably true that there is always an equivalent expression (something like $V?pairs::*[?value instance of T].) but that gets awfully cumbersome. A great strenth of axis steps is that the simple predicates - selecting children by name - can be expressed very concisely; you wouldn't to write every child step as child::*[node-name() eq QName("", "para")].

michaelhkay avatar Sep 10 '24 08:09 michaelhkay

Could we define a type::xyz style axis and selector then? E.g. $tree??type::foo. -- That's clear enough and should not be ambiguous. It also generalizes to other axes as well such as $tree??attribute::name.

rhdunn avatar Sep 10 '24 09:09 rhdunn