Compiler
Compiler copied to clipboard
Backtrack issue when rules overlap
Hi! This library looks really awesome, so I'm playing with it but I'm facing with a "basic" issue and I can't figure out if it's a limitation, a bug, or my mistake… Can you help me?
Here a minimalist grammar to illustrate my situation:
%token a a
%token word \w+
#root:
<word> | <a>
I want to match all words, but a
is a special keyword, I want to match it distinctly. The problem comes when I try to parse "ab"
: a
is recognized as a token and then the parser is stuck on b
character with an UnexpectedToken
exception. In my understanding, the parser should backtrack, discard the choice of the token a
and follow with the token word
… Am I wrong?
ℹ️
- If I invert the order of rules,
"a"
input is identified as aword
👎 - I could use
%token word a\w+|[^a]\w*
at first rule but… looks very weird and hard to maintain IMHO - I could discard the token
a
, matching words only and use AST to identify my specific keywords, but I think it's the role of the syntax analyzer, isn't it?
Thanks in advance for your help, and your nice work on this library :) 👍
ℹ️ My current workaround is to suffix my keywords with something like this:
%token bool (true|false)(?![a-zA-Z_0-9])
To prevent falsely
to be detected as the keyword false
followed by ly
(should match an other pattern for "identifier" tokens)… but seems a little tricky, isn't it?