[BUG] Pest.rs hangs indefinitely with this grammar
// Less extends https://www.w3.org/TR/css-syntax-3/
ws = { " " | "\t" | NEWLINE }
WHITESPACE = _{ ws* }
COMMENT = _{ multi_comment | line_comment }
multi_comment = @{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
line_comment = @{ "//" ~ (!"\n" ~ ANY)* }
escape = @{ "\\" ~ escape_text }
escape_text = { !("\n" | ASCII_HEX_DIGIT) ~ ANY | ASCII_HEX_DIGIT{1,6} ~ ws? }
nmstart = @{ "-"? ~ (ASCII_ALPHA | "_" | NON_ASCII | escape) }
nmchar = @{ ASCII_DIGIT | ASCII_ALPHA | "_" | "-" | escape }
ident_token = @{
( "--" | nmstart ) ~ nmchar*
}
function_token = @{ ident_token ~ "(" }
at_keyword_token = @{ "@" ~ ident_token }
hash_token = @{ "#" ~ (nmchar | escape)+ }
string1 = @{
"\"" ~ (
!("\"" | "\\" | "\n") ~ ANY
| escape
| "\\" ~ NEWLINE
)* ~ "\""
}
string2 = @{
"'" ~ (
!("'" | "\\" | "\n") ~ ANY
| escape
| "\\" ~ NEWLINE
)* ~ "'"
}
string_token = { string1 | string2 }
url_token = @{
"url("
~ ws*
~ (
!("\"" | "'" | "(" | ")" | "\\" | ws | NON_PRINTABLE) ~ ANY
| escape
| string_token
)*
~ ws*
~ ")"
}
NON_PRINTABLE = {
'\u{0000}'..'\u{0008}'
| "\u{000B}"
| '\u{000E}'..'\u{001F}'
| "\u{007F}"
}
number_token = @{
("+" | "-")?
~ (
ASCII_DIGIT+ ~ ("." ~ ASCII_DIGIT+)?
| "." ~ ASCII_DIGIT+
)
~ (
^"e"
~ ("+" | "-")?
~ ASCII_DIGIT+
)?
}
// or percentage_token
dimension_token = @{ number_token ~ (ident_token | "%") }
CDO_token = { "<!--" }
CDC_token = { "-->" }
NON_ASCII = { '\u{0080}'..'\u{10FFFF}' }
at_rule = {
at_keyword_token
}
qualified_rule = {
ident_token ~ "{" ~ "}"
}
rule_list = {
(
qualified_rule
| at_rule
)*
}
root = {
(
CDO_token
| CDC_token
| qualified_rule
| at_rule
)*
}
The existence of an additional match after ident_token will cause qualified_rule to hang when typing, as well as any rule which includes qualified_rule, such as root. Even though selecting ident_token from the drop-down on Pest.rs does not hang when typing. I haven't tried the Rust integration yet as I'm still crafting grammar. Is there any reason to expect the Rust part would work when Pest.rs fails / hangs like this?
Note: I tried to reduce this to just qualified_rule, and only the rules referenced. But, when I did that, the grammar actually succeeded and didn't freeze the site. So, somehow there's an invisible interaction with other rules that are not referenced? 🤔
@matthew-dean I think it is because of this:
WHITESPACE = _{ ws* }
In Pest, the whitespace is supposed to be just a character, not a sequence of them. For example:
WHITESPACE = _{ " " | "\n" | "\t" }
I think removing the * from your WHITESPACE rule would stop the hanging, especially since I tried adding a sequence with * to the WHITESPACE rule in the Pest playground, and the page froze.
@ancientstraits Oh, it auto-consumes multiples of that token between other tokens? 🤔
Yes, when you define WHITESPACE then ~ effectively does ~ WHITESPACE* ~, so you're getting (ws*)*, which just infinitely repeats the empty string. WHITESPACE needs to always consume at least one character.
IIRC we have a safeguard against this for normal rules, but apparently WHITESPACE isn't handled.
@CAD97 Guess we need to add in that safeguard then. I will try to see where to look to add that safeguard
It's at this line. https://github.com/pest-parser/pest/blob/525ba7b3a2e7bcb79d6ddea45ad9c3f978c5686f/meta/src/validator.rs#L341 Wonder why it never triggers for WHITESPACE.
This specific exemple of a grammar causing the parser to hang indefinitely was fixed in #848, although there are still cases not covered.