lexer consumes more of string than expected
I'm trying to write a lexer for a programming language.
State: there are 2 tokenKinds defined:
#[token("-")]
Sub,
#[regex(r"[0-9]+", |output| output.slice().parse::<i64>().ok())]
IntLiteral(i64),
The input String "-1" correctly gets lexed as Token::Sub and Token::IntLiteral(1).
If I now add a third tokenKind
#[regex(r"([+-]?(([0-9]+[eE][+-]?[0-9]+)|([0-9]*\.[0-9]+[eE][+-]?[0-9]+|[0-9]*\.[0-9]+)))", |output| {
output.slice().parse::<f64>().unwrap()
})
]
FloatLiteral(f64),
that does not match a single - or a single 1, somehow, the input "-1" now gets parsed as Token::Sub and then None instead. The lexer.slice() returns "-1" after the first call to lexer.next(), indicating that the Sub token somehow consumed the 1 of the input string as well.
Edit: See comment below for the repo including a file small to replicate the behaviour.
yes, I realize that parsing -1.0 as a single Float is probably a bad idea and has a lot of flaws. Nevertheless this seems like sketchy behaviour.
Hi @Angrymanvvv, thanks for reporting your bug!
Could you please format the code using triple backticks? See guide here.
Also, don't include Zip files with code, just put it where, along with the output. Thanks!
Thanks for the reply! I've created a minimal code file that repicates the behaviour, as well as an explaination and output in the corresponding readme file. See https://github.com/Angrymanvvv/Logos-Bug-MVE
Fixed by #491
The following passes
mod issue_478 {
use logos::{Logos, SpannedIter};
#[derive(Logos, Debug, Clone, PartialEq)]
pub enum Token {
#[token("-")]
Sub,
#[regex(r"[0-9]+", |output| {
output.slice().parse::<i64>().ok()
})]
IntLiteral(i64),
#[regex(r"([+-]?(([0-9]+[eE][+-]?[0-9]+)|([0-9]*\.[0-9]+[eE][+-]?[0-9]+|[0-9]*\.[0-9]+)))", |output| {
output.slice().parse::<f64>().ok()
})]
FloatLiteral(f64),
}
#[test]
fn neg_int_two_tokens() {
let mut lexer = Token::lexer("-1");
assert_eq!(lexer.next(), Some(Ok(Token::Sub)));
assert_eq!(lexer.next(), Some(Ok(Token::IntLiteral(1))));
assert_eq!(lexer.next(), None);
}
#[test]
fn float_literals() {
for (input, output) in [
("1.0", Token::FloatLiteral(1.0)),
(".01", Token::FloatLiteral(0.01)),
("3.1e-12", Token::FloatLiteral(3.1e-12)),
("2E3", Token::FloatLiteral(2E3)),
("1.5", Token::FloatLiteral(1.5)),
("-1.5", Token::FloatLiteral(-1.5)),
] {
let token = Token::lexer(input).next();
assert_eq!(token, Some(Ok(output)));
}
}
}