ogcapi-features
ogcapi-features copied to clipboard
Simplify the cql2-text grammar (future version improvements?)
This is feedback from trying to implement cql2-text. Implementers (or at least us) face struggles with the current grammar.
I think it comes down mainly to these two things:
- Some of the capabilities from extension conformance classes are defined as separate rules. I think it would be much easier to simply define new possible values for operators or pre-defined functions identifiers (using the same grammar rule as function calls) for operators using a function call syntax (i.e., array/spatial/temporal operators and predicates). This would cut down the number of rules dramatically, and I think would also allow to make the requirements in each conformance class clearer.
- Some rules seem to exist only to restrict the data types (e.g., numericExpression, characterExpression, temporalExpression...). However, this is purely a runtime concept, since the data type that a certain expression (e.g., a property) will evaluate to will depend on the queryables. Therefore I would not have used grammar rules (which are about the syntax) to make this distinction. Instead, I think what is needed for this is to have requirements and/or permissions that specify the interpretation if an unexpected data type is used in such a context.
I think simplifying these two aspects of the grammar would directly result in simpler parser implementations, greater ease of implementation and greater interoperability.
Meeting 2022-06-20: It would be good to understand why this would result in an easier implementation. We need to discuss this in a meeting when @jerstlouis is present.
Thanks @cportele . I should be attending the next meeting in a couple weeks.
As a summary, from a syntactic point of view, I think the two things I suggested above would result in fewer grammar rules (simpler grammar), and parser node classes would be a more direct / natural match to the rules. We would implement the function/operator name validation / data types checking separately from the parsing, since some of it is only known at runtime (e.g., available functions, queryable data types). e.g., in our implementation we have a CQL2CallExp node class which we plan to use to handle the array / spatial / temporal operators which syntactically look like function calls. We are hand-writing a Recursive Descent parser, borrowing heavily from our ECCSS/CMSS parser.
The following excerpt from our internal CQL2 design document mapping CQL2 conformance classes and providing a concise summary of the CQL2 syntax might be insightful. A simpler grammar could potentially closely match those CQL2* AST node classes to rules. We could eventually prototype such a simpler grammar together with railroad diagrams demonstrating the idea.
Basic CQL2
- Defines predicate expressions evaluating to a boolean value, which we parse as the following eC AST node classes:
- CQL2Identifier for identifiers, which are sequences of UTF-8 characters. Identifiers can also be double-quoted to include any arbitrary characters. As in ECCSS,
true,falseandnullwill be treated as identifiers in our implementation (with the drawback that they cannot be used for identifiers even double-quoted).- Valid identifier starting characters:
":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] - Additional valid identifier continuing characters:
"-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
- Valid identifier starting characters:
- CQL2Expression for a generic expression class from which other CQL2Exp* are derived:
- Sub-expressions enclosed in parentheses
()to override default operator priorities - CQL2ExpIdentifier for expressions consisting of an identifier (CQL2Identifier).
- CQL2ExpConstant for decimal numeric literals, integer or fractional using
., no suffixes used, including support for scientific notation (Eseparating power of 10 exponent) - CQL2ExpString for UTF-8 character string literals enclosed in single quote (
'); single quote characters within a string literal are represented by two consecutive single-quote characters ('') - CQL2ExpCall with support for:
DATEas a well-known function taking a string literal (CQL2ExpString) defining a date instantTIMESTAMPas a well-known function taking a string literal (CQL2ExpString) defining a datetime instant
- CQL2ExpOperation, with support for the following operators (note that all CQL2 keywords are case-insensitive):
- Unary operator
NOTfollowed by an operand - Binary logical operators
ANDandOR - Binary relational operators
=,<,>,<=,>=,<> - Binary relational operator
ISandIS NOT(ignore extra spaces betweenISandNOT) for checking againstNULLidentifier. - For relational operators with Basic CQL2, CQL2ExpIdentifier (other than
true,falseandnull) are only supported for the left operand, whiletrue,false,nulland literals are supported in second operands only.
- Unary operator
- Sub-expressions enclosed in parentheses
- CQL2Identifier for identifiers, which are sequences of UTF-8 characters. Identifiers can also be double-quoted to include any arbitrary characters. As in ECCSS,
Property-Property
- Removes the limitation in which operands of relational operators identifiers or literals can be used
Arithmetic Expressions
- Adds support for the following binary operators in CQL2ExpOperation:
+,-,*,/(fractional division, see also Features#711),^(exponent) - Adds support for the
-unary operator? (Features#709)
Advanced Comparison Operators
- Adds
LIKEandNOT LIKErelational operators (ignore extra spaces betweenNOTandLIKE) that accepts a pattern where%matches 0..n arbitrary characters,_matches exactly one arbitrary character (and those characters can be escaped by using a\character); expects text expressions only, and string literals in right operand.- NOTE: Equivalent but different functionality in ECCSS is provided by the
^(starts with),$(ends with),~(contains) text operators and their negated counterparts.
- NOTE: Equivalent but different functionality in ECCSS is provided by the
- Adds
BETWEENandNOT BETWEENternary relational operators (e.g.,depth BETWEEN 100.0 AND 150.0); expects numeric expressions only. - Adds
INandNOT INrelational operators taking a comma-separated list of expressions (CQL2ExpList) within parentheses as second operand; items in the list are expected to be of same type as value being tested.
Functions
- Adds CQL2ExpCall with support for implementation-defined custom functions, taking a list of expressions within parentheses
()as arguments following an identifier (CQL2Identifier) for the function to call - Implies use of CQL2ExpList for function arguments separated by commas
- Although the CQL2 specification and grammar does not currently define it as such, syntactically all of the following extended conformance classes could have been defined using the functions calls grammar rule, and our parser implement it as such using an CQL2ExpCall AST node. This demonstrates that functions are a mechanism by which CQL2 could be extended independently from the specification.
- Except for WKT, only the array literals would require the addition of a new grammar rule since it uses
[...]rather than e.g.,ARRAY(...)). My suggestion in Features#718 is to use(1,2,3)for array literals instead. To support WKT, support for space-separated tuples are also required e.g.,10 30inPOLYGON((10 30, 40 20, 50 80, 10 30)).
- Except for WKT, only the array literals would require the addition of a new grammar rule since it uses
Case-insensitive Comparison
- Adds CQL2ExpCall with support for the
CASEIwell-known function returning a case-desensitised version of a string.
Accent-insensitive Comparison
- Adds CQL2ExpCall with support for the
ACCENTIwell-known function returning an accent-desensitised version of a string.
Basic Spatial Operators
- Adds CQL2ExpCall with support for the
POINT,LINESTRING,POLYGON,MULTIPOINT,MULTILINESTRING,MULTIPOLYGON,GEOMETRYCOLLECTIONandENVELOPEwell-known functions defining vector geometry objects following the simple features model (WKT encoding).- Also implies support for space-separated tuples and array literals using
()to support the WKT notation as arguments to those function calls
- Also implies support for space-separated tuples and array literals using
- Adds the
S_INTERSECTSwell-known function for spatial intersection operator - Implies use of CQL2ExpList for function arguments separated by commas.
Spatial Operators
- Implies Basic Spatial operator support, and adds the following well-known functions for additional spatial operators:
S_CONTAINS,S_CROSSES,S_DISJOINT,S_EQUALS,S_OVERLAPS,S_TOUCHES,S_WITHIN
Temporal Operators
- Adds CQL2ExpCall with support for:
INTERVALas a well-known function taking two instants string literals (CQL2ExpString) defining a temporal interval object- the following operators taking both instants and intervals as arguments:
T_AFTER,T_BEFORE,T_DISJOINT,T_EQUALS,T_INTERSECTS - the following operators taking only intervals as arguments:
T_CONTAINS,T_DURING,T_FINISHEDBY,T_FINISHES,T_MEETS,T_METBY,T_OVERLAPPEDBY,T_OVERLAPS,T_STARTEDBY,T_STARTS
- Implies use of CQL2ExpList for function arguments separated by commas
Array Operators
- Adds CQL2ExpArray (array literals as a list of expressions (CQL2ExpList) within
[]) - Adds CQL2ExpCall with support for the
A_CONTAINEDBY,A_CONTAINS,A_EQUALSandA_OVERLAPSarray operators as well-known functions - Implies use of CQL2ExpList for expressions array and for function arguments separated by commas
@pvretano
See first draft of proposed simpler grammar rules in https://github.com/opengeospatial/ogcapi-features/issues/723#issuecomment-1172603159.
Note that in the approach I suggest in defining the grammar production rules, operators / functions are not really keywords, but regular identifiers used in function call expressions (or spatial/literal/array literals definitions using same syntax as function calls). For example, this means that a date or s_intersects queryable would not require to be double-quoted (as in the current abstract tests), since date would only take its meaning of a temporal literal when it is followed an opening parenthesis (, and therefore there really is no ambiguity to date<>DATE('2022-04-16').
In my opinion this makes it much easier to extend the language with additional functions / operators, since those additions would not introduce additional keywords that break implementations not previously requiring queryables with the same name to be double-quoted. The list of keywords in 8.2 (which would need to be double-quoted, if allowed at all) would be reduced to:
- AND
- BETWEEN
- DIV
- FALSE
- IN
- IS
- LIKE
- NOT
- NULL
- OR
- TRUE
All of the other ones would get tokenized by the lexer as an identifier which can be used as operators/function calls, or to define literals and only get resolved in the contexts where they apply. This is the approach taken in C-like languages where standard functions and data types/structs (or classes in C++) are not classified as keywords.
Also note that SQL keywords (or "reserved" words) do not seem to include any function-like keywords either. Things like UPPER() changing case are described as functions instead.
See the CartoSym-CSS BNF lexer / grammar for ANTLR4 which should (in theory) be a true superset of CQL2:
https://github.com/opengeospatial/styles-and-symbology/blob/main/core/schemas/CartoSym-CSS-Lexer.g4
https://github.com/opengeospatial/styles-and-symbology/blob/main/core/schemas/CartoSym-CSS-Grammar.g4
The starting rule for CQL2 is expression (e.g., you can paste the Lexer and Grammar at http://lab.antlr.org/ and test any CQL2 expression with expression as the start rule).
When I have a chance I will extract only the CQL2 relevant part.