llama.cpp
llama.cpp copied to clipboard
Eval bug: llama-cli, spurious token added to assistant response
Name and Version
version: 5327 (27ebfcac) built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
nvidia
Models
all
Problem description & steps to reproduce
After the user prompt is provided, the code enters this branch: https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L716
No new tokens are generated.
However, the following code assumes that there is a new token and it is inserted in the assistant response:
https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L824
First Bad Commit
No response
Relevant log output
The easiest way is to set a breakpoint here and wait for the assistant message:
https://github.com/ggml-org/llama.cpp/blob/0cf6725e9f9a164c39f7a87214d60342f7f946d8/tools/main/main.cpp#L270
I noticed this char before, I always just assumed it was a spurious prompt print (since most templates end with >, but I see now that it's repeating the last processed token of the template.
Hi Is the bug fixed? If not, Can I pick it up
Hi Is the bug fixed? If not, Can I pick it up
AFAIK no, please do.