pest [BUG] Pest.rs hangs indefinitely with this grammar

// Less extends https://www.w3.org/TR/css-syntax-3/

ws = { " " | "\t" | NEWLINE }
WHITESPACE = _{ ws* }
COMMENT = _{ multi_comment | line_comment }

multi_comment = @{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
line_comment = @{ "//" ~ (!"\n" ~ ANY)* }

escape = @{ "\\" ~ escape_text }
escape_text = { !("\n" | ASCII_HEX_DIGIT) ~ ANY | ASCII_HEX_DIGIT{1,6} ~ ws? }

nmstart = @{ "-"? ~ (ASCII_ALPHA | "_" | NON_ASCII | escape) }
nmchar = @{ ASCII_DIGIT | ASCII_ALPHA | "_" | "-" | escape }
ident_token = @{
    ( "--" | nmstart ) ~ nmchar*
}

function_token = @{ ident_token ~ "(" }
at_keyword_token = @{ "@" ~ ident_token }
hash_token = @{ "#" ~ (nmchar | escape)+ }

string1 = @{
    "\"" ~ (
        !("\"" | "\\" | "\n") ~ ANY
        | escape
        | "\\" ~ NEWLINE 
    )* ~ "\""
}
string2 = @{
    "'" ~ (
        !("'" | "\\" | "\n") ~ ANY
        | escape
        | "\\" ~ NEWLINE 
    )* ~ "'"
}
string_token = { string1 | string2 }

url_token = @{
    "url("
    ~ ws*
    ~ (
        !("\"" | "'" | "(" | ")" | "\\" | ws | NON_PRINTABLE) ~ ANY
        | escape
        | string_token
    )*
    ~ ws*
    ~ ")"
}

NON_PRINTABLE = {
	'\u{0000}'..'\u{0008}'
	| "\u{000B}"
    | '\u{000E}'..'\u{001F}'
    | "\u{007F}"
}

number_token = @{
    ("+" | "-")?
    ~ (
        ASCII_DIGIT+ ~ ("." ~ ASCII_DIGIT+)?
        | "." ~ ASCII_DIGIT+
    )
    ~ (
        ^"e"
        ~ ("+" | "-")?
        ~ ASCII_DIGIT+
    )?
}

// or percentage_token
dimension_token = @{ number_token ~ (ident_token | "%") }
CDO_token = { "<!--" }
CDC_token = { "-->" }

NON_ASCII = { '\u{0080}'..'\u{10FFFF}' }

at_rule = {
    at_keyword_token
}

qualified_rule = {
	ident_token ~ "{" ~ "}"
}

rule_list = {
    (
        qualified_rule
        | at_rule
    )*
}

root = {
    (
        CDO_token
        | CDC_token
        | qualified_rule
        | at_rule
    )*
}

The existence of an additional match after ident_token will cause qualified_rule to hang when typing, as well as any rule which includes qualified_rule, such as root. Even though selecting ident_token from the drop-down on Pest.rs does not hang when typing. I haven't tried the Rust integration yet as I'm still crafting grammar. Is there any reason to expect the Rust part would work when Pest.rs fails / hangs like this?

Note: I tried to reduce this to just qualified_rule, and only the rules referenced. But, when I did that, the grammar actually succeeded and didn't freeze the site. So, somehow there's an invisible interaction with other rules that are not referenced? 🤔

Dec 12 '21 22:12 matthew-dean

@matthew-dean I think it is because of this:

WHITESPACE = _{ ws* }

In Pest, the whitespace is supposed to be just a character, not a sequence of them. For example:

WHITESPACE = _{ " " | "\n" | "\t" }

I think removing the * from your WHITESPACE rule would stop the hanging, especially since I tried adding a sequence with * to the WHITESPACE rule in the Pest playground, and the page froze.

May 05 '22 19:05 ancientstraits

@ancientstraits Oh, it auto-consumes multiples of that token between other tokens? 🤔

May 05 '22 20:05 matthew-dean

Yes, when you define WHITESPACE then ~ effectively does ~ WHITESPACE* ~, so you're getting (ws*)*, which just infinitely repeats the empty string. WHITESPACE needs to always consume at least one character.

IIRC we have a safeguard against this for normal rules, but apparently WHITESPACE isn't handled.

May 05 '22 21:05 CAD97

@CAD97 Guess we need to add in that safeguard then. I will try to see where to look to add that safeguard

May 05 '22 21:05 ancientstraits

It's at this line. https://github.com/pest-parser/pest/blob/525ba7b3a2e7bcb79d6ddea45ad9c3f978c5686f/meta/src/validator.rs#L341 Wonder why it never triggers for WHITESPACE.

May 05 '22 21:05 ancientstraits

This specific exemple of a grammar causing the parser to hang indefinitely was fixed in #848, although there are still cases not covered.

Apr 28 '23 14:04 Tartasprint