antlr4-lab icon indicating copy to clipboard operation
antlr4-lab copied to clipboard

Sample Kotlin Grammar does not load properly and is unusable.

Open Incoherent-Code opened this issue 1 year ago • 5 comments

How to reproduce:

  1. Go to Antlr Lab Site
  2. Click on the sample dropdown, and select either entry for Kotlin. (I'm assuming one entry is for kotlin-formal, but neither work)
  3. Click on the tab labeled Parser.

You'll see that the Lexer is incorrectly placed in the parser, instead of the kotlin parser. The sample does not function in this state.

Even with the kotlin parser in the correct place, a solution is needed for importing UnicodeClasses.g4, which the kotlin lexer relies on. Otherwise, the sample will throw many implicit token errors. I usually have to manually copy the contents of unicodeClasses.g4 to the end of the lexer to use the kotlin grammar with antlr lab.

Incoherent-Code avatar Jul 23 '24 19:07 Incoherent-Code

Someone manually changed the grammars.json file. https://github.com/antlr/grammars-v4/blob/1e08bcbcc56b8ff2cfad7508815544e141d188e9/grammars.json#L2070. It's wrong and it should have been generated by script, not hand edited. https://github.com/antlr/grammars-v4/blob/master/_scripts/mkindex.py

kaby76 avatar Jul 23 '24 19:07 kaby76

Upon further inspection, this is actually a bug with mkindex.py itself. I tried running mkindex.py again and got this output, which is still wrong: grammar.json

Incoherent-Code avatar Jul 27 '24 18:07 Incoherent-Code

The problem lies with lines 113 and 114:

lexer = grammars[0] if 'Lexer' in grammars[0] else grammars[1]
parser = grammars[0] if 'Parser' in grammars[0] else grammars[1]

The Kotlin pom file defines UnicodeClasses.g4 first, then the lexer and parser are listed. This edge case means that both lexer and parser are set to grammars[1].

<includes>
   <include>UnicodeClasses.g4</include>
   <include>KotlinLexer.g4</include>
   <include>KotlinParser.g4</include>
</includes>

I also noticed that kotlin-formal/pom.xml doesn't include UnicodeClasses.g4 at all, even though KotlinLexer.g4 still imports from it.

Incoherent-Code avatar Jul 27 '24 19:07 Incoherent-Code

Upon further inspection, this is actually a bug with mkindex.py itself. I tried running mkindex.py again and got this output, which is still wrong: grammar.json

Thanks for checking this. The error is in the pom.xml itself--it has UnicodeClasses.g4 stated as a "top-level g4". Yes, it is a "lexer grammar", but it is not a "top-level g4". A "top-level g4" is a g4 that we run the tool on. UnicodeClasses.g4 is an imported file, so the tool should not be run on this file.

(And honestly, I don't understand why we are using the pom.xml for this information, when this can all be derived by trparse/trquery, or by looking at the desc.xml. The Maven tester has been replaced by trgen because trgen figures out top-level grammars, start rules, etc. When it can't, it uses the desc.xml.)

I'll need to fix the pom.xml and reindex.

kaby76 avatar Jul 27 '24 19:07 kaby76

While the parser and lexer grammar tabs fill up with the correct .g4 data, lab.antlr.org does not work with either of the kotlin grammars. It can't because the .g4's contain "import" statements, and lab.antlr.org does not implement UI for imported grammars. The mk-index script does not weed out these grammars, but it should. https://github.com/antlr/grammars-v4/issues/4201

kaby76 avatar Aug 09 '24 18:08 kaby76