ogcapi-features icon indicating copy to clipboard operation
ogcapi-features copied to clipboard

Consider Removing Left Recursion From CQL 2 BNF grammar

Open WillGunther opened this issue 3 years ago • 7 comments

I've been working on a CQL2 text parser implementation based on the BNF file in this repository. The parsing library I'm using does not support left hand recursion which appears twice in the BNF file. To simplify the task of implementing the parser for CQL 2 I propose these be refactored to remove left recursion as it is fairly common for parsing libraries to not support this. This issue seems somewhat related to #705

The sections with left hand recursion that I found are


booleanExpression = andExpression 
                  | orExpression
                  | notExpression
                  | comparisonPredicate
                  | spatialPredicate
                  | temporalPredicate
                  | arrayPredicate
                  | subExpression
                  | booleanLiteral;

andExpression = booleanExpression "AND" booleanExpression;
arithmeticExpression = arithmeticExpression plusSign arithmeticTerm
                     | arithmeticExpression minusSign arithmeticTerm
                     | arithmeticTerm;

WillGunther avatar Jul 20 '22 18:07 WillGunther

Meeting 2022-08-01: @cportele will check, they also made changes to the grammar so that the parsing is unambiguous in antlr.

cportele avatar Aug 01 '22 14:08 cportele

Here is a potential rewrite of booleanExpression (inspired by Wirth). I have tested this in our implementation using Antlr:

booleanExpression = booleanTerm [ {"OR" booleanTerm} ];
booleanTerm = booleanFactor [ {"AND" booleanFactor} ];
booleanFactor = ["NOT"] booleanPrimary;
booleanPrimary = predicate
               | booleanLiteral
               | leftParen booleanExpression rightParen;
predicate = comparisonPredicate
          | spatialPredicate
          | temporalPredicate
          | arrayPredicate;

The same approach should work for arithmeticExpression, too, but I have not tested it, because we have not yet added support for arithmetic operations:

arithmeticExpression = arithmeticTerm [ {arithmeticOperatorPlusMinus arithmeticTerm} ];
arithmeticOperatorPlusMinus = plusSign | minusSign;
arithmeticTerm = powerTerm [ {arithmeticOperatorMultDiv powerTerm} ];
arithmeticOperatorMultDiv = asterisk | solidus | percent | "div";
powerTerm = arithmeticFactor [ caret arithmeticFactor ];
arithmeticFactor = leftParen arithmeticExpression rightParen
                 | [ minusSign ] arithmeticOperand;
arithmeticOperand = numericLiteral
                  | propertyName
                  | function;

The other change that we did in our implementation is that we have changed all cases with multiple alternative expressions, since each expression rule includes propertyName and function (which are not type-specific), so they appear multiple times in the rule, if I resolve the non-terminals. For example, scalarExpression is in our grammar:

scalarExpression = propertyName
                 | characterClause
                 | numericLiteral
                 | booleanLiteral
                 | instantLiteral
                 | function
                 | arithmeticExpression;

characterClause : characterLiteral
                | CASEI LEFTPAREN characterExpression RIGHTPAREN
                | ACCENTI LEFTPAREN characterExpression RIGHTPAREN;

cportele avatar Aug 16 '22 14:08 cportele

Thank you for looking at this. I was able to try out the boolean expression structure that you added above as well as the scalar expression change you made and they fixed some of the issues I was having.

For now I've also opted to not implement the arithmetic expressions. I think that they add some ambiguity to the grammar as most numeric literals can also be arithmetic expressions with numeric literals.

WillGunther avatar Aug 17 '22 23:08 WillGunther

Meeting 2022-09-12: @pvretano will discuss the booleanExpression and arithmeticExpression rules internally. Others are invited to add their thoughts to this issue, too. We plan to make a decision how to update the grammar in 2 or 4 weeks.

cportele avatar Sep 12 '22 14:09 cportele

@cportele @pvretano all,

As I brought up in #705 (with draft rules organization suggestion here), I think that different rules for the expression types (e.g., boolean, array, number, text, that can be returned by a FunctionExpression or be the type of a property IdentifierExpression) will cause all sort of problems, because at compile-time (parser writing / generation time) their type is not known, so syntactically that is not something that can be validated.

So it seems to me that the restriction on returning or accepting a particular type (as opposed to what tokens are valid) should be specified outside of the grammar. It might no longer be necessary to keep the boolean type restriction in CQL2 itself if the grammar is being restructured, since the reason to keep it in CQL2 rather than in Features - Part 3 was to avoid all the changes necessary to the grammar. This also could mean reducing the number of reserved keywords at the grammar level, so that type constructors and functions are pre-defined identifiers instead (defined outside of the grammar, which makes it easier to extend).

See also this UML model for expressions that we are working on in Styles & Symbology, with the intent to fully cover the CQL2 functionality (more details here), on which the distinct rules I suggested are based.

jerstlouis avatar Sep 12 '22 18:09 jerstlouis

Meeting 2022-09-26: The pull request #767 looks good, but we should wait a couple of weeks before merging this, so that this is open for comments from implementers.

cportele avatar Sep 26 '22 14:09 cportele

Thanks! I was able to test out the the changes in my implementation and I now have all my test cases passing. I like the change, and I've also looked at some SQL bnfs which have similar structures for the arithmetic expressions.

One note from my implementation, I opted to remove numericExpression in favor of using arithmeticExpression since properties, functions and numeric literals all can be arithmetic expressions with no right hand side. I was having trouble forcing my parsing library to use the simpler terms. I don't think this necessarily needs a change, just wanted to pass that information along.

WillGunther avatar Sep 27 '22 17:09 WillGunther