lark
lark copied to clipboard
Better token formatting available?
What is your question?
I have a Lark grammar that has the following production:
!inneroperator : "has" | "=" | "<" | ">" | "<=" | ">=" | "!="
(The whole grammar should parse statements like foo < 3, pretty simple stuff actually)
When I give parse something invalid, such as foo 3, the error message reads:
No terminal defined for '3' at line 1 col 8
foo 3
Expecting: {'EQUAL', 'HAS', 'MORETHAN', '__ANON_1', '__ANON_0', '__ANON_2', 'LESSTHAN'}
This is already quite readable, and I'm impressed. However, I don't know what these ANON things are, and also I would ideally output = instead of EQUAL. Is that possible?
This is a drawback of the earley parse at the moment. If you use lalr instead (which you grammar should be fine for), you already get nicer messages.
You can also prevent this by naming the Terminals:
!inneroperator : "has" | EQ | NE | LT | LE | GT | GE
EQ: "="
NE: "!="
LT: "<"
LE "<="
GT: ">"
GE: ">="
Basically the same error message with the lalr parser. I tried naming the terminals, but neither EQUAL nor EQ make it clear to the user that it's = that's expected (I mean, it could be ==, too).
There has already been some talk about improving the error messages by providing the expected values, so I will keep that in mind as a possible task.
Another easier fix we could do, is include default names for >= and <= and !=, as they are common enough to the landscape.
Aside from that (and it's not an easy problem, of course), what are the ANON tokens mentioned?
When a terminal doesn't have a name (i.e. defined as "foo"), and Lark can't guess the name, it calls it ANON.