llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Misc. bug: context shift results in error

Open gompa opened this issue 2 months ago • 4 comments

Name and Version

build/bin/./llama-server --version version: 4384 (14b699ec) built with cc (Debian 14.2.0-11) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

When running llama-server with the following command: ./build/bin/llama-server -fa -ctk q8_0 -ctv q8_0 -m ../models/phi-4-Q6_K.gguf --host 0.0.0.0 --port 8085 the same happens with llama3.2-3b so I don't think its model specific

sending a large request with chat history (full context length) crashes the server with : llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error New requests to the server are ignored I think its related to the function : ggml_compute_forward_dup the dst->type and src->type (8 vs 0 ) mismatch and there is no q* handler

First Bad Commit

No response

Relevant log output

request: POST /v1/chat/completions 192.168.1.59 200
slot launch_slot_: id  0 | task 613 | processing task
slot update_slots: id  0 | task 613 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 3817
slot update_slots: id  0 | task 613 | kv cache rm [3520, end)
slot update_slots: id  0 | task 613 | prompt processing progress, n_past = 3817, n_tokens = 297, progress = 0.077810
slot update_slots: id  0 | task 613 | prompt done, n_past = 3817, n_tokens = 297
slot update_slots: id  0 | task 613 | slot context shift, n_keep = 0, n_left = 4095, n_discard = 2047
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
fatal error
fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
fatal error
fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error
llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3996: fatal error

gompa avatar Dec 23 '24 13:12 gompa