llama.cpp Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B)

Name and Version

latest

Operating systems

No response

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

llama-cli -hf bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M --grammar 'root ::= "{"' -p hey -no-cnv

Problem description & steps to reproduce

With the following extremely simple grammar somehow at the time we reach the grammar sampler, there's only 1 candidate (@) and it hard crashes.

First Bad Commit

cc/ @ggerganov could this be related to any recent refactoring? (~~https://github.com/ggerganov/llama.cpp/pull/10803 maybe?~~ I'll try and bissect)

Relevant log output

hey/tmp/llama.cpp-20250131-5280-k2rjfn/src/llama-grammar.cpp:1216: GGML_ASSERT(!grammar.stacks.empty()) failed

Feb 02 '25 11:02 ochafik

Tried w/ --samplers "" and this time it's crashing on <|reserved_special_token_247|>, which I'm not sure should have made it this far (maybe wrong token type in the GGUF?).

Regardless, will probably replace the GGML_ASSERT(!grammar.stacks.empty()) with:

    if (grammar.stacks.empty()) {
        throw std::runtime_error("Unexpected empty grammar stack after accepting piece: " + piece);
    }

Feb 02 '25 12:02 ochafik

Seems specific to DeepSeek-R1-Distill-Llama-8B-GGUF (the Qwen 7B & 32B distills don't crash with that grammar)

Feb 02 '25 12:02 ochafik

It does not crash on my end:

llama-cli --version
version: 4570 (6e84b0ab)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

llama-cli -hf bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF:Q4_K_M --grammar 'root ::= "{"' -p hey -no-cnv

0.01.168.912 I system_info: n_threads = 16 (n_threads_batch = 16) / 24 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | AARCH64_REPACK = 1 | 
0.01.168.912 I 
0.01.169.188 I sampler seed: 2240775377
0.01.169.197 I sampler params: 
	repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
	dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
0.01.169.200 I sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
0.01.169.200 I generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
0.01.169.202 I 
hey{ [end of text]


0.01.218.866 I llama_perf_sampler_print:    sampling time =       0.33 ms /     4 runs   (    0.08 ms per token, 12232.42 tokens per second)
0.01.218.877 I llama_perf_context_print:        load time =     453.76 ms
0.01.218.880 I llama_perf_context_print: prompt eval time =      19.98 ms /     2 tokens (    9.99 ms per token,   100.12 tokens per second)
0.01.218.882 I llama_perf_context_print:        eval time =      12.88 ms /     1 runs   (   12.88 ms per token,    77.67 tokens per second)
0.01.218.883 I llama_perf_context_print:       total time =      78.57 ms /     3 tokens
0.01.220.078 I ggml_metal_free: deallocating

Feb 02 '25 12:02 ggerganov

Oh, mine crashes w/ the following versions:

./build/bin/llama-cli --version
version: 4617 (90517ec4)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

llama-cli --version  # homebrew
version: 4606 (a83f5286)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

Feb 02 '25 12:02 ochafik

fwiw, I'm getting this same exception when calling llama_sampler_init_grammar_lazy from llamasharp consistently with any of the DeepSeek distills. It produces the think tags as expected, hits the triggerWords of and then immediately fails. However, I add a grammar via llama_sampler_init_grammar then no problem, just need to adjust the gbnf to account for the thinking tags. I believe llamasharp is built against 4620

Feb 02 '25 23:02 phil-scott-78

@phil-scott-78 thanks for reporting!

Note that the Qwen distills should get better generally with https://github.com/ggerganov/llama.cpp/pull/11607 (although no changes related to grammar), and another possible thing might be the double bos situation (addressing in https://github.com/ggerganov/llama.cpp/pull/11616 ). Hope to circle back to this in a couple of days.

Feb 03 '25 13:02 ochafik

right on. For what it's worth, I tried again with lazy grammar with Mistral-Small-24B-Instruct-2501. Gave it a prompt to include its thinking to force the issue. Same thing, output its thinking, got to the </think> and blew up on the assert when it came time to do the grammar. All this is on 4620 though. I'll try and reproduce with llama.cpp when I get a chance though. Don't want to be chasing ghosts already resolved because of that project lagging a bit.

Feb 03 '25 15:02 phil-scott-78

Found at least one issue: if a token contains or completes a trigger and adds text that can't be parsed by the grammar, then kaboom (came up while testing upcoming changes that add even more triggers (ref); testing possible fixes).

In any case, the issue reported in this bug seems to work for me now, probably because of ~~https://github.com/ggerganov/llama.cpp/pull/11616~~ (edit) https://github.com/ggerganov/llama.cpp/pull/11607

Feb 13 '25 10:02 ochafik

Will close this as I can't repro the original issue, please feel free to open a new one if you still experience problems!

Feb 25 '25 16:02 ochafik

llama.cpp llama.cpp copied to clipboard

Eval bug: trivial grammar crashes (DeepSeek R1 Distill Llama 8B)

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

llama.cpp
llama.cpp copied to clipboard