antlr4
antlr4 copied to clipboard
[JavaScript runtime] The JavaScript runtime does not correct error node name in parse tree.
This is a bug that I found while trying to validate the parse trees produced for input across different ports (aka "target"). The problem is specifically in the JavaScript runtime, at this line: https://github.com/antlr/antlr4/blob/67228355c5bfd1ed5ebb89e726992ec43dda7b53/runtime/JavaScript/src/antlr4/error/DefaultErrorStrategy.js#L529
The problem is that the token referenced is out of bounds for the array.
For grammar antlr/antlr4, input LexerElementLabel.g4, the parse tree is wrong for the JavaScript target.
$ git diff .
diff --git a/antlr/antlr4/examples/LexerElementLabel.g4.tree b/antlr/antlr4/examples/LexerElementLabel.g4.tree
index 24a5362f..1379cabe 100644
--- a/antlr/antlr4/examples/LexerElementLabel.g4.tree
+++ b/antlr/antlr4/examples/LexerElementLabel.g4.tree
@@ -1 +1 @@
-(grammarSpec (grammarDecl (grammarType lexer grammar) (identifier LexerElementLabel) ;) (rules (ruleSpec (lexerRuleSpec Token : (lexerRuleBlock (lexerAltList (lexerAlt lexerElements))) <missing SEMI>)) (ruleSpec (parserRuleSpec var = 'token' ;))) <EOF>)
\ No newline at end of file
+(grammarSpec (grammarDecl (grammarType lexer grammar) (identifier LexerElementLabel) ;) (rules (ruleSpec (lexerRuleSpec Token : (lexerRuleBlock (lexerAltList (lexerAlt lexerElements))) <missing undefined>)) (ruleSpec (parserRuleSpec var = 'token' ;))) <EOF>)
\ No newline at end of file
diff --git a/antlr/antlr4/examples/three.g4.tree b/antlr/antlr4/examples/three.g4.tree
index efea5989..82075168 100644
--- a/antlr/antlr4/examples/three.g4.tree
+++ b/antlr/antlr4/examples/three.g4.tree
@@ -1 +1 @@
-(grammarSpec (grammarDecl grammarType identifier <missing SEMI>) rules <EOF>)
\ No newline at end of file
+(grammarSpec (grammarDecl grammarType identifier <missing undefined>) rules <EOF>)
\ No newline at end of file
01/03-16:21:24 ~/issues/g4-all-trees/antlr/antlr4/examples
The runtime references the array recognizer.literalNames[expectedTokenType] directly, and out of bounds for the value of expectedTokenType. The code should be making a method call, to recognizer.Vocabulary.GetDisplayName(expectedTokenType).
Python3 may have a similar issue.
The workaround is to not save parse trees for input that we know have a parse error.