logos
logos copied to clipboard
Error kinds
Sometimes you want to report a specific lexer error rather than the general one.
I also agree with the previous comment but I do think more context to be added. Here is a fairly common example IMO. Consider the following Lexer
enum Token {
// Operators
#[token("+")] ADD,
#[token("-")] SUB,
#[token("*")] MUL,
#[token("/")] DIV,
// Values
#[regex(r"0|[1-9][0-9]*", |lex| i32::from_str(lex.slice()))]
NUM(i32),
#[error]
ERROR,
}
This might be the Lexer for a pretty simple calculator or maybe part of some larger compiler. In this example, there are two ways for the Lexer to fail and return an ERROR token:
- The lexer finds a string that does not match any of the tokens. For example, maybe something that contains letters like a word or something
- The lexer matches a string to the regex "0|[1-9][0-9]*" but fails to parse the string into a i32. This can happen when the number is too large for 32-bit integers.
It would be useful if these two errors were distinguished in some way. Maybe since the callback for parsing NUM returns a Result, capture that type and make it accessible from some function. Or maybe allow for multiple error tokens for different scenarios. Any thoughts?
Sorry, I did not realize that there was already a whole discussion on this at #104.