text-generation-webui
text-generation-webui copied to clipboard
llama.cpp sampling doesn't work in 1.15
Describe the bug
llama.cpp models always gives exactly the same output (compared in winmerge to be sure), like they ignore any sampling options and seed. Sometimes the first output after loading model is a bit different, but every regenerate gives exactly the same output. I also tried with temperature=5. I can see that seed changes in the console. Even reloading the model or restarting whole webui doesn't help.
This seems to happen only with llama.cpp loader, I tried with some exl2 models and they worked fine - outputs were different.
This doesn't seem model specific as I tried multiple GGUF models (which worked as expected in the past), like Mistral Nemo and Small and Qwen 2.5 32B. The same GGUFs worked before updating webui to 1.15 (with update_wizard_windows.bat as I usually did), so probably something in that update broke it?
This doesn't seem to be purely UI issue as I tried with SillyTavern over API too and effects were the same.
I also tried installing fresh copy of webui (git clone and start_windows.bat), but issue still happens on that fresh install.
There was a similar issue in the past #5451 - but in that case changing top_k helped - in my case it didn't help. Also the mentioned llama-cpp-python versions in that issue are very old (as the issue is old). I don't know if source of this problem is in webui or llama-cpp-python.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
- Load any GGUF model with llama.cpp loader
- Generate any response and note it down
- Regenerate multiple times with high temperature
The regenerated outputs are always the same
Screenshot
No response
Logs
Not sure what logs might be needed here
System Info
Windows 10
RTX 3090 - GPU driver 565.90
webui 1.15 - commit d1af7a41ade7bd3c3a463bfa640725edb818ebaf (newest on branch main)