logos icon indicating copy to clipboard operation
logos copied to clipboard

Strange behaviour when matching 'else' / 'else if'

Open irh opened this issue 4 years ago • 2 comments

I'm working on a lexer for a language where I'd like to have else and else if lexed as separate tokens, but I'm running into suprising behaviour.

In the following example you can see that else has been lexed as Other:

mod else_if {
    use logos::Logos;

    #[derive(Logos, Debug, PartialEq)]
    enum Token {
        #[regex(r"[ ]+", logos::skip)]
        #[error]
        Error,
        #[token("else")]
        Else,
        #[token("else if")]
        ElseIf,
        #[regex(r"[a-z]*")]
        Other,
    }

    #[test]
    fn else_x_else_if_y() {
        let mut lexer = Token::lexer("else x else if y");

        // Expected: assert_eq!(lexer.next().unwrap(), Token::Else);
        assert_eq!(lexer.next().unwrap(), Token::Other);

        assert_eq!(lexer.next().unwrap(), Token::Other);
        assert_eq!(lexer.next().unwrap(), Token::ElseIf);
        assert_eq!(lexer.next().unwrap(), Token::Other);
    }
}

Removing the space from else if allows else to be parsed as Else:

mod else_if_2 {
    use logos::Logos;

    #[derive(Logos, Debug, PartialEq)]
    enum Token {
        #[regex(r"[ ]+", logos::skip)]
        #[error]
        Error,
        #[token("else")]
        Else,
        #[token("elseif")]
        ElseIf,
        #[regex(r"[a-z]*")]
        Other,
    }

    #[test]
    fn else_x_else_if_y() {
        let mut lexer = Token::lexer("else x elseif y");

        assert_eq!(lexer.next().unwrap(), Token::Else);
        assert_eq!(lexer.next().unwrap(), Token::Other);
        assert_eq!(lexer.next().unwrap(), Token::ElseIf);
        assert_eq!(lexer.next().unwrap(), Token::Other);
    }
}

Keeping the space in else if, but removing some of the characters from Else causes it to be unexpectedly matched.

mod else_if_3 {
    use logos::Logos;

    #[derive(Logos, Debug, PartialEq)]
    enum Token {
        #[regex(r"[ ]+", logos::skip)]
        #[error]
        Error,
        #[token("e")]
        Else,
        #[token("else if")]
        ElseIf,
        #[regex(r"[a-z]*")]
        Other,
    }

    #[test]
    fn else_x_else_if_y() {
        let mut lexer = Token::lexer("else x else if y");

        // Expected: assert_eq!(lexer.next().unwrap(), Token::Other);
        assert_eq!(lexer.next().unwrap(), Token::Else);

        assert_eq!(lexer.next().unwrap(), Token::Other);
        assert_eq!(lexer.next().unwrap(), Token::ElseIf);
        assert_eq!(lexer.next().unwrap(), Token::Other);
    }
}

My understanding of the token disambiguation documentation is that the first example should work as I'd expect, with Else and ElseIf being matched independently, with higher priority than Other. Do I have that wrong? And is the last example exposing a bug?

Thanks for your time and the great library!

irh avatar Jun 08 '20 14:06 irh

This definitely looks like a bug, will have a look as soon as I can, thanks for reporting!

maciejhirsz avatar Jun 09 '20 09:06 maciejhirsz

I'm currently running into this, was any solution ever discovered?

Zenthial avatar Jun 26 '23 17:06 Zenthial