logos Error kinds

Error kinds

Open suhr opened this issue 4 years ago • 2 comments

Sometimes you want to report a specific lexer error rather than the general one.

Jun 02 '20 13:06 suhr

I also agree with the previous comment but I do think more context to be added. Here is a fairly common example IMO. Consider the following Lexer

enum Token {
  // Operators
  #[token("+")]  ADD,
  #[token("-")]  SUB,
  #[token("*")]  MUL,
  #[token("/")]  DIV,

  // Values
  #[regex(r"0|[1-9][0-9]*", |lex| i32::from_str(lex.slice()))] 
  NUM(i32),

  #[error]
  ERROR,
}

This might be the Lexer for a pretty simple calculator or maybe part of some larger compiler. In this example, there are two ways for the Lexer to fail and return an ERROR token:

The lexer finds a string that does not match any of the tokens. For example, maybe something that contains letters like a word or something
The lexer matches a string to the regex "0|[1-9][0-9]*" but fails to parse the string into a i32. This can happen when the number is too large for 32-bit integers.

It would be useful if these two errors were distinguished in some way. Maybe since the callback for parsing NUM returns a Result, capture that type and make it accessible from some function. Or maybe allow for multiple error tokens for different scenarios. Any thoughts?

Oct 01 '20 02:10 srilman

Sorry, I did not realize that there was already a whole discussion on this at #104.

Oct 01 '20 03:10 srilman

logos logos copied to clipboard

Error kinds

logos
logos copied to clipboard