antlr4 icon indicating copy to clipboard operation
antlr4 copied to clipboard

[Antlr4.7] token recognition error at

Open gaulouis opened this issue 8 years ago • 3 comments

Hello,

I'm trying antlr 4.7 (Java runtime) with a test grammar in this repository

$ echo "<?php echo Hell ?>" | java org.antlr.v4.gui.TestRig Php block -tree
line 1:11 token recognition error at: 'H'
line 1:12 token recognition error at: 'el'
line 1:14 token recognition error at: 'l'
(block (prolog <?php) (statement (function_echo echo)) (epilog ?>))

"Hell" is is incorrect. So, it's normal that I get token recognition error at: 'el' with two caracter ?

gaulouis avatar Aug 26 '17 11:08 gaulouis

I have try the same thing with Antlr 4.6

gaulouis@gaulouis-desktop:~/local/src/tmp/test/php_antlr$ java org.antlr.v4.gui.TestRig Test block -tree
<?php echo Hell ?>
line 1:11 token recognition error at: 'H'
line 1:12 token recognition error at: 'el'
line 1:14 token recognition error at: 'l'
(block (prolog <?php) (statement (function_echo echo)) (epilog ?>))
gaulouis@gaulouis-desktop:~/local/src/tmp/test/php_antlr$ java org.antlr.v4.gui.TestRig Test block -tree
<?php echo HEll ?>
line 1:11 token recognition error at: 'H'
line 1:12 token recognition error at: 'E'
line 1:13 token recognition error at: 'l'
line 1:14 token recognition error at: 'l'
(block (prolog <?php) (statement (function_echo echo)) (epilog ?>))

I am surprised to get two different errors cause of the case line 1:12 token recognition error at: 'el' line 1:12 token recognition error at: 'E'

And same behaviour with Antlr 4.5.3/4.5.4

To get antlr 4.6 i do

$git clone https://github.com/antlr/antlr4 antlr4.6
$cd  antlr4.6
$git checkout -b antlr_4-6 4.6
$mvn -DskipTests install
$export CLASSPATH="`pwd`/tool/target/antlr4-4.6-complete.jar"
$java org.antlr.v4.Tool
ANTLR Parser Generator  Version 4.6

Maybe I made a mistake somewhere? Can you help me ?

gaulouis avatar Aug 26 '17 17:08 gaulouis

it's probably because 'e' was matching the beginning of 'echo'. I wouldn't bother too much, if it swallows valid code, and refuses invalid code that's the mandate.

siliconvoodoo avatar Feb 25 '20 11:02 siliconvoodoo

The problem is that your lexer doesn't cover all valid character sequences. There's no token that corresponds to "H", "He", "Hel", etc. So, what you can do is add an ANY : . ; rule at the very end that matches all the stuff that nothing else matched. Then have ANY in your parser rules wherever you want to match arbitrary text.

timmc avatar Aug 30 '22 21:08 timmc