openCypher
openCypher copied to clipboard
Syntactic coverage
Build on PR #520.
This PR add a little tool that does some syntax coverage analysis of the TCK scenarios.
The current output is:
Cypher 3752 3752
Statement 3752 3752
Query 3752 3752
RegularQuery 3743 3735
Union 16 12
SingleQuery 3759 3735
SinglePartQuery 3757 3733
MultiPartQuery 972 972
UpdatingClause 1353 318
ReadingClause 1984 1493
Match 1534 1268
Unwind 414 216
Merge 105 81
MergeAction 28 25
Create 1107 117
Set 88 85
SetItem 90 85
Delete 48 48
Remove 33 33
RemoveItem 35 33
InQueryCall 36 35
StandaloneCall 17 17
YieldItems 34 33
YieldItem 50 33
With 1646 972
Return 3633 3610
ProjectionBody 5279 3624
ProjectionItems 5278 3623
ProjectionItem 7274 3616
Order 341 338
Skip 46 46
Limit 276 276
SortItem 536 338
Where 1239 831
Pattern 2646 1367
PatternPart 3302 1395
AnonymousPatternPart 3302 1395
PatternElement 3302 1395
NodePattern 6059 1403
PatternElementChain 2689 753
RelationshipPattern 2689 753
RelationshipDetail 2525 657
Properties 756 152
RelationshipTypes 1862 290
NodeLabels 1516 425
NodeLabel 1561 425
RangeLiteral 154 145
LabelName 1561 425
RelTypeName 1872 290
Expression 0 0
OrExpression 123 60
XorExpression 87 32
AndExpression 123 60
NotExpression 181 153
ComparisonExpression 1417 689
AddOrSubtractExpression 393 227
MultiplyDivideModuloExpression 275 156
PowerOfExpression 1 1
UnaryAddOrSubtractExpression 165 139
StringListNullOperatorExpression 404 338
ListOperatorExpression 167 137
StringOperatorExpression 32 29
NullOperatorExpression 210 172
PropertyOrLabelsExpression 1321 735
Atom 32341 3679
Literal 19445 2925
BooleanLiteral 1022 263
ListLiteral 1861 976
PartialComparisonExpression 1428 689
ParenthesizedExpression 294 179
RelationshipsPattern 68 64
FilterExpression 998 617
IdInColl 1040 659
FunctionInvocation 3254 1578
FunctionName 3254 1578
ExistentialSubquery 13 10
ExplicitProcedureInvocation 48 47
ImplicitProcedureInvocation 5 5
ProcedureResultField 21 13
ProcedureName 53 52
Namespace 3307 1629
ListComprehension 283 137
PatternComprehension 16 16
PropertyLookup 1383 754
CaseExpression 251 75
CaseAlternative 311 75
Variable 18547 3501
NumberLiteral 11144 2185
MapLiteral 2767 1197
Parameter 89 61
PropertyExpression 79 73
PropertyKeyName 8970 1727
IntegerLiteral 10428 2100
DoubleLiteral 811 264
SchemaName 12403 1978
ReservedWord 25 21
SymbolicName 34904 3624
LeftArrowHead 607 100
RightArrowHead 1612 489
Dash 5378 753
In the grammar rules for expressions are “drop-through”, so that even a literal causes all expression non-terminals to show up in the parse tree. With regards to coverage that is obviously not very informative. Hence, the numbers above do not count rules with they
- end with “Expression”,
- have less the 2 rule children in the parse tree,
- have 0 terminal children in the parse tree, and
- have less than 2 alternative and in the grammar.
With that the expression numbers look much more informative.
The difference between the columns is as follows:
- The first column counts all instances of a rule (non terminal) in the parse trees.
- The second column counts all instances of a rule only once per
When executing query
step. For instance,RETURN 123, 456
, would haveNumberLiteral 2 1
. I do not know of any scenario having more than oneWhen executing query
step, so you can think of the second column as the number of scenarios where the tested query has at least one instance of the respective rules.
I am still somewhat puzzled about where being practically no syntax which is not covered by a scenario query. However, potential reason are:
- Syntax is used in some scenarios, which actually test something else, e.g. mathematical operations are likely to have counts for this reason.
- A feature does not have special syntax, e.g. it is exposed through a built-in functions or are special combination of otherwise normal syntax.
However, this is coverage of syntax — not of semantics.