llama.cpp
llama.cpp copied to clipboard
Misc. bug: context shift results in error
Name and Version
build/bin/./llama-server --version version: 4384 (14b699ec) built with cc (Debian 14.2.0-11) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
When running llama-server with the following command:
./build/bin/llama-server -fa -ctk q8_0 -ctv q8_0 -m ../models/phi-4-Q6_K.gguf --host 0.0.0.0 --port 8085
the same happens with llama3.2-3b so I don't think its model specific
sending a large request with chat history (full context length) crashes the server with :
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
New requests to the server are ignored
I think its related to the function : ggml_compute_forward_dup
the dst->type and src->type (8 vs 0 ) mismatch and there is no q* handler
First Bad Commit
No response
Relevant log output
request: POST /v1/chat/completions 192.168.1.59 200
slot launch_slot_: id 0 | task 613 | processing task
slot update_slots: id 0 | task 613 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 3817
slot update_slots: id 0 | task 613 | kv cache rm [3520, end)
slot update_slots: id 0 | task 613 | prompt processing progress, n_past = 3817, n_tokens = 297, progress = 0.077810
slot update_slots: id 0 | task 613 | prompt done, n_past = 3817, n_tokens = 297
slot update_slots: id 0 | task 613 | slot context shift, n_keep = 0, n_left = 4095, n_discard = 2047
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
fatal error
fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
fatal error
fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error