llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Bug: Failed to process regex error with long repeating sequences

Open mayaeary opened this issue 1 year ago • 1 comments

What happened?

Long repeated strings of characters are crashing the server with an error.

Failed to process regex: [^\r\n\p{L}\p{N}]?((?=[\p{L}])([^a-z]))*((?=[\p{L}])([^A-Z]))+|[^\r\n\p{L}\p{N}]?((?=[\p{L}])([^a-z]))+((?=[\p{L}])([^A-Z]))*|\p{N}| ?[^\s\p{L}\p{N}]+[\r\n/]*|\s*[\r\n]+|\s+(?!\S)|\s+ Regex error: regex_error(error_stack): There was insufficient memory to determine whether the regular expression could match the specified character sequence.

Sample prompt: usususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususususus

This issue is reproduced on any Mistral-Nemo-12B based models I've tried, particularly on this one Mistral-Nemo-Instruct-2407-GGUF

Name and Version

.\llama-server.exe --version version: 3862 (3f1ae2e3) built with MSVC 19.35.32215.0 for x64

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

mayaeary avatar Oct 02 '24 13:10 mayaeary

As temporary workaround, I've found a solution of disabling max stack size for regex at the top of src/unicode.cpp:

#define _REGEX_MAX_STACK_COUNT 0

#include "unicode.h"
#include "unicode-data.h"

#include <algorithm>
#include <cassert>
// ...

But it's unclear if that solution is safe enough.

mayaeary avatar Oct 02 '24 18:10 mayaeary

Hey, I also have this issue. I'm not able to edit the source code. Is there another solution ? Thanks.

psykokwak-com avatar Nov 09 '24 08:11 psykokwak-com

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Dec 25 '24 01:12 github-actions[bot]