jison
jison copied to clipboard
EOF token is returned only once in recursive grammar
I have the following grammar:
// test.jison
%lex
%%
\s+ // skip whitespace
\w+ return 'IDENTIFIER';
\: return 'BEGIN_BLOCK';
$ { console.log('EOF'); return 'EOF'; }
. return 'INVALID';
/lex
%start DOCUMENT
%%
DOCUMENT: STATEMENT EOF;
STATEMENT: IDENTIFIER STMT_BLOCK;
STMT_BLOCK: /**/ | BEGIN_BLOCK DOCUMENT;
This is the test script:
// test.js
import jison from 'jison';
import fs from 'fs';
const grammar = fs.readFileSync('test.jison', 'utf8');
const parser = jison.Parser(grammar);
try {
parser.parse(process.argv[2]);
} catch(err) {
console.log(err.message);
}
The command node test.js 'level1'
runs without errors and prints EOF
.
We should expect node test.js 'level1: level2'
to print EOF
twice, but it prints this instead:
EOF
Parse error on line 1:
level1: level2
--------------^
Expecting 'EOF', got '1'
The reason is that the EOF
token is returned only once, at the nested level. After that, the 1
token (the parser value for end-of-file) is returned. Unfortunately, we cannot reference this special token from the grammar, which makes it impossible to parse this particular language. :(
To fix it, I believe the $
(or equivalent <<EOF>>
) rule should get picked up indefinitely while matching the end of file. Or else provide a way to reference the 1
token directly in the grammar.