grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

Kotlin grammars

Open kaby76 opened this issue 1 year ago • 1 comments

I am trying to investigate https://github.com/antlr/antlr4-lab/issues/83.

  • There are two Kotlin grammars: kotlin and kotlin-formal. In addition, one can find the kotlin grammar in the Jetbrains repo (https://github.com/Kotlin/kotlin-spec/tree/4b29a8b42e08237f45c0c3c185eaae4bba3751f6/grammar/src/main/antlr).

  • The readmes in the two kotlin grammars in this repo don't explain why there are two, what the differences are, and don't explain which one to chose.

  • What version do either of these grammars intend to support? There is no version information of which release.

  • The link https://github.com/antlr/grammars-v4/blob/1bfcc5a6b954008e23bc5a982864364a069c8756/kotlin/kotlin-formal/README.md?plain=1#L7 in the kotlin-formal readme is dead.

  • kotlin-formal is tested. https://github.com/antlr/grammars-v4/blob/1bfcc5a6b954008e23bc5a982864364a069c8756/kotlin/kotlin-formal/desc.xml#L3

  • kotlin is not tested. https://github.com/antlr/grammars-v4/blob/1bfcc5a6b954008e23bc5a982864364a069c8756/kotlin/kotlin/desc.xml#L3

kaby76 avatar Feb 09 '24 11:02 kaby76

In addition, NL and semi are cavalierly sprinkled throughout the grammar, which causes ambiguity and quite poor performance. There is no theoretical, consistent, thought-out manner of how it should be used, and where it should be used. For example, consider how propertyDeclaration is parsed.

Input:

var a = 1
var a = 2
var a = 3
var a = 4
var a = 5
var a = 6
var a = 7

This input causes large k lookahead because the parser requires full context to understand when to use the NL, whether in propertyDeclaration, or in topLevelObject. It is even wrong in the "spec" grammar implementation.

There is even a faux pas following the wrong "NL*" use in the production. "(getter? (NL* semi? setter)? | setter? (NL* semi? getter)?)" is an alt with both sides that can derive empty. A grammar should never offer the choice of empty vs empty!

$ (trperf y > out; cat out | head -1 > out2; cat out | tail -n +2 | sort -k6 -n -r | head > out3; cat out2 out3 | column -t)
Time to parse: 00:00:00.1550288
Decision  Rule                    Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
157       propertyDeclaration     14           0.331425  203      50     7         7            0       25
305       postfixUnaryExpression  7            0.064142  21       3      0         0            0       4
146       propertyDeclaration     7            0.029213  21       3      0         0            0       2
300       asExpression            7            0.056031  14       2      0         0            0       3
289       elvisExpression         7            0.055394  14       2      0         0            0       3
278       conjunction             7            0.058866  14       2      0         0            0       3
275       disjunction             7            0.057148  14       2      0         0            0       3
156       propertyDeclaration     7            0.071258  14       2      0         0            0       3
350       primaryExpression       7            0.000744  7        1      0         0            0       1
301       prefixUnaryExpression   7            0.002013  7        1      0         0            0       1

If you correct the NL's in propertyDeclaration and getter/setter, the max-k's are somewhat resolved.

$ diff KotlinParser.g4 ..
178,179c178
<     ) (NL* typeConstraints)? (NL* ('=' NL* expression | propertyDelegate))?
< (
---
>     ) (NL* typeConstraints)? (NL* ('=' NL* expression | propertyDelegate))? (NL+ ';')? NL* (
203,204c202,203
<     : NL? modifiers? 'get'
<     | NL? modifiers? 'get' NL* '(' NL* ')' (NL* ':' NL* type_)? NL* functionBody
---
>     : modifiers? 'get'
>     | modifiers? 'get' NL* '(' NL* ')' (NL* ':' NL* type_)? NL* functionBody
208,209c207,208
<     : NL? modifiers? 'set'
<     | NL? modifiers? 'set' NL* '(' (annotation | parameterModifier)* setterParameter ')' (
---
>     : modifiers? 'set'
>     | modifiers? 'set' NL* '(' (annotation | parameterModifier)* setterParameter ')' (
02/10-07:58:19 ~/issues/g4-3959/kotlin/kotlin-formal/Generated-CSharp
$ (trperf y > out; cat out | head -1 > out2; cat out | tail -n +2 | sort -k6 -n -r | head > out3; cat out2 out3 | column -t)
Time to parse: 00:00:00.1211182
Decision  Rule                    Invocations  Time      Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
306       postfixUnaryExpression  7            0.051186  21       3      0         0            0       4
146       propertyDeclaration     7            0.030375  21       3      0         0            0       2
301       asExpression            7            0.044945  14       2      0         0            0       3
290       elvisExpression         7            0.041122  14       2      0         0            0       3
279       conjunction             7            0.048455  14       2      0         0            0       3
276       disjunction             7            0.038296  14       2      0         0            0       3
163       propertyDeclaration     7            0.191566  21       2      7         7            0       15
158       propertyDeclaration     7            0.056232  14       2      0         0            0       3
155       propertyDeclaration     7            0.039291  14       2      0         0            0       3
502       semis                   7            0.004448  7        1      0         0            0       2
02/10-07:59:24 ~/issues/g4-3959/kotlin/kotlin-formal/Generated-CSharp

kaby76 avatar Feb 10 '24 12:02 kaby76