logos
logos copied to clipboard
Detect EOF in regular expressions
It would be nice if the \W
anchor was supported in #[regex]
strings such that certain regular expressions could succeed if EOF is encountered early. For example:
#[derive(Logos)]
enum Token {
#[end]
End,
#[error]
Error,
#[regex = r"/\*([^*]|\*+[^*/]|[^\W])*(\*+/|\W)"]
BlockComment,
}
The regex shown above would be able to match both /* hello */
and also /* hello
, where the second one is immediately followed by an #[end]
. This would allow for the lexer to process the token successfully even if the input terminates early before the */
could be received, and we could check whether the comment is properly terminated by calling lexer.slice().ends_with("*/")
at a later stage.
See TokenKind::BlockComment { terminated: bool }
in rustc_lexer
for an example of this common lexing pattern.
Might still add this to regex itself, but you will be able to handle this with callbacks next release, reference: #103.
Closing this a too old, feel free to re-open if needed.