llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

grammar: handle misplaced special regex chars [*+?]

Open rick-github opened this issue 6 months ago • 5 comments

Don't try to access the previous element of a regex seq if the vector is empty. Prevents segment violation failures for regexes like ^?[0-9]$ and ^(?:a|b)$.

Fixes: #13390

rick-github avatar May 08 '25 20:05 rick-github

Don't try to access the previous element of a regex seq if the vector is empty. Prevents segment violation failures for regexes like ^?[0-9]$ and ^(?:a|b)$.

The latter should have been handled (non-capturing groups are unsupported) here, but seems to just pass it on as if it was a normal pattern: https://github.com/ggml-org/llama.cpp/blob/19494ff82f40bfde882782b0dff414e4f806d235/common/json-schema-to-grammar.cpp#L450-L457

CISC avatar May 09 '25 07:05 CISC

@rick-github I think both examples should report errors instead of quietly eating the expression characters.

The first example is a clearly invalid expression and should be reported as such, and the second is unsupported, preferably it should include the whole unsupported group in its error message.

CISC avatar May 09 '25 10:05 CISC

This PR is just for removing a DoS vector for projects using the llama.cpp codebase. Improving the parser is out of scope.

rick-github avatar May 09 '25 10:05 rick-github

I'm not talking about improving the parser, just reporting the error instead of quietly mangling the expression.

CISC avatar May 09 '25 11:05 CISC

Just adding an error to the _errors stack and returning an empty pair might do the trick?

CISC avatar May 09 '25 11:05 CISC