logos
logos copied to clipboard
Strange behaviour when matching 'else' / 'else if'
I'm working on a lexer for a language where I'd like to have else
and else if
lexed as separate tokens, but I'm running into suprising behaviour.
In the following example you can see that else
has been lexed as Other
:
mod else_if {
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[regex(r"[ ]+", logos::skip)]
#[error]
Error,
#[token("else")]
Else,
#[token("else if")]
ElseIf,
#[regex(r"[a-z]*")]
Other,
}
#[test]
fn else_x_else_if_y() {
let mut lexer = Token::lexer("else x else if y");
// Expected: assert_eq!(lexer.next().unwrap(), Token::Else);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::ElseIf);
assert_eq!(lexer.next().unwrap(), Token::Other);
}
}
Removing the space from else if
allows else
to be parsed as Else
:
mod else_if_2 {
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[regex(r"[ ]+", logos::skip)]
#[error]
Error,
#[token("else")]
Else,
#[token("elseif")]
ElseIf,
#[regex(r"[a-z]*")]
Other,
}
#[test]
fn else_x_else_if_y() {
let mut lexer = Token::lexer("else x elseif y");
assert_eq!(lexer.next().unwrap(), Token::Else);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::ElseIf);
assert_eq!(lexer.next().unwrap(), Token::Other);
}
}
Keeping the space in else if
, but removing some of the characters from Else
causes it to be unexpectedly matched.
mod else_if_3 {
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[regex(r"[ ]+", logos::skip)]
#[error]
Error,
#[token("e")]
Else,
#[token("else if")]
ElseIf,
#[regex(r"[a-z]*")]
Other,
}
#[test]
fn else_x_else_if_y() {
let mut lexer = Token::lexer("else x else if y");
// Expected: assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::Else);
assert_eq!(lexer.next().unwrap(), Token::Other);
assert_eq!(lexer.next().unwrap(), Token::ElseIf);
assert_eq!(lexer.next().unwrap(), Token::Other);
}
}
My understanding of the token disambiguation documentation is that the first example should work as I'd expect, with Else
and ElseIf
being matched independently, with higher priority than Other
. Do I have that wrong? And is the last example exposing a bug?
Thanks for your time and the great library!
This definitely looks like a bug, will have a look as soon as I can, thanks for reporting!
I'm currently running into this, was any solution ever discovered?