text-generation-webui
text-generation-webui copied to clipboard
GGUF models fail with llama.cpp loader, incorrectly uses llama_cpp_server_loader (ModuleNotFoundError: llama_cpp_binaries)
Describe the bug
When attempting to load any GGUF model using the explicitly selected llama.cpp loader in the Model tab, the loading process fails. The traceback indicates that the code execution is incorrectly routed through modules/llama_cpp_server.py, which attempts to import the llama_cpp_binaries package. This package is not installed (as llama-cpp-python was installed instead) and is not required for the standard llama.cpp loader, leading to a ModuleNotFoundError. The UI selection for the llama.cpp loader seems to be ignored for GGUF models.
Expected behavior: The llama.cpp loader should successfully load the GGUF model by utilizing the installed llama-cpp-python library directly. It should not attempt to execute code from modules/llama_cpp_server.py or require the llama_cpp_binaries package when the standard llama.cpp loader is selected.
Is there an existing issue for this?
- [x] I have searched the existing issues
Reproduction
- Perform a clean installation of text-generation-webui on Windows using the steps above.
- Place any GGUF model file (e.g., mistral-7b-instruct-v0.1.Q4_K_M.gguf) into the models directory.
- Start the server using python server.py.
- Open the web UI and navigate to the "Model" tab.
- Select the GGUF model from the dropdown list.
- Select llama.cpp from the "Model loader" dropdown list.
- Click the "Load" button.
- Observe the error traceback in the console.
Screenshot
Logs
Running on local URL: http://127.0.0.1:7860
17:26:06-427596 INFO Loading "mistral-7b-instruct-v0.1.Q4_K_M.gguf"
17:26:06-607221 ERROR Failed to load the model.
Traceback (most recent call last):
File "C:\text-generation-webui\modules\ui_model_menu.py", line 162, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
File "C:\text-generation-webui\modules\models.py", line 43, in load_model
output = load_func_map[loader](model_name)
File "C:\text-generation-webui\modules\models.py", line 68, in llama_cpp_server_loader
from modules.llama_cpp_server import LlamaServer
File "C:\text-generation-webui\modules\llama_cpp_server.py", line 10, in <module>
import llama_cpp_binaries
ModuleNotFoundError: No module named 'llama_cpp_binaries'
System Info
OS: Windows 11
Python Version: Python 3.10.6
WebUI Version: Latest commit from main branch as of 2025-04-26 (installed via git clone and git pull)
Installation: Clean install using venv and pip install -r requirements/full/requirements.txt, followed by pip install llama-cpp-python.
Same problem here, using the dev branch.
I did the following: after downloading the new stuff from the dev branch (using git pull origin dev), I run update_wizard_windows.bat and choose A) Update the web UI. After updating all libraries, everythink was working.
It should not attempt to execute code from modules/llama_cpp_server.py or require the llama_cpp_binaries package when the standard llama.cpp loader is selected.
Yes it should. llama-cpp-python requirement (and support) has been removed and has been replaced by llama_cpp_binaries. See https://github.com/oobabooga/text-generation-webui/commit/ae54d8faaa556bddf3e0805f4f3d04bb56c9be4b
Not sure why you're installing llama-cpp-python manually. You should update the webui and pip install -r the new requirements\full\requirements.txt inside the venv. Or just delete installer_files and let it install everything it needs from scratch.
Yes it should.
llama-cpp-pythonrequirement (and support) has been removed and has been replaced byllama_cpp_binaries. See ae54d8f
In the file llama_cpp_server.py I still see references to llama_cpp_binaries.
Not sure why you're installing llama-cpp-python manually. You should update the webui and pip install -r the new requirements\full\requirements.txt inside the venv. Or just delete installer_files and let it install everything it needs from scratch.
Honestly I already solved my problem by running the portable version. So far so good
Same problem here, using the
devbranch.I did the following: after downloading the new stuff from the dev branch (using
git pull origin dev), I runupdate_wizard_windows.batand chooseA) Update the web UI. After updating all libraries, everythink was working.
I already found the portable version that works well. Thanks!
So I updated to 3.1 on Windows 10. I had it running and when trying to load a model (Mistral-Nemo-Instruct-2407-Q6_K.gguf with llama.cpp) and I get the "ModuleNotFoundError: No module named 'llama_cpp_binaries'" error shown in the original screenshot. I originally had this as an HF (so that I can use Dry, a big help) and worked great before upgrading, but seems that was removed and now trying to load as gguf with the error.
Not sure why you're installing
llama-cpp-pythonmanually. You should update the webui andpip install -rthe newrequirements\full\requirements.txtinside the venv. Or just deleteinstaller_filesand let it install everything it needs from scratch.
I followed these instructions. The install -r completed but didn't resolve the error. Removing the "installer_files" folder and starting again did download a lot of things (took most of the day), but now I'm getting this error:
The system cannot find the path specified.Miniconda hook not found.
And I can't get further. Both the start and update script get that error now. I do have this installed in my F:\AI\text-generation-webui\ folder. And I did save a copy of the installer_files folder, if I should need to put that back.
I've been trying to get this update working for two, going on the third day now. Thanks for any help.
So I put the old installer_files folder back in place and I'm able to run it again. Re-ran all the web ui and extention updates. Loading the model still gives the error. Here is the specific one I'm getting:
01:29:34-787365 INFO Loading "Mistral-Nemo-Instruct-2407-Q6_K.gguf"
01:29:35-222345 ERROR Failed to load the model.
Traceback (most recent call last):
File "F:\AI\text-generation-webui\modules\ui_model_menu.py", line 174, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
File "F:\AI\text-generation-webui\modules\models.py", line 43, in load_model
output = load_func_map[loader](model_name)
File "F:\AI\text-generation-webui\modules\models.py", line 66, in llama_cpp_server_loader
from modules.llama_cpp_server import LlamaServer
File "F:\AI\text-generation-webui\modules\llama_cpp_server.py", line 12, in <module>
import llama_cpp_binaries
ModuleNotFoundError: No module named 'llama_cpp_binaries'
Is there some reason llama_cpp_binaries isn't being found? Something I need to manually install?
Maybe this is connected. From full\requirements.txt:
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.8.0/llama_cpp_binaries-0.8.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"
In the environment:
python --version Python 3.10.13
I do have Python 3.13.3 on my system, but that's not 3.11 either. Do I need to update the python in the environment?
Edit: I meant to mention release notes do mention this:
Make llama-cpp-binaries wheels compatible with any Python >= 3.7 (useful for manually installing the requirements under requirements/portable/).
But that doesn't match what requirements.txt says above. But portable\requirements.txt is different:
https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.8.0/llama_cpp_binaries-0.8.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"
Should the full requirements match? Maybe that's why others mentioned being able to get the portable one working.
Well I didn't get any responses here, but I have finally gotten it working again, only taking most of a week to do an update. In case someone else has this issue here is what I had done.
- Updated my video drivers, including updating CUDA from 12.1 to 12.9.
- Updated Python on my system from 3.10.13 to 3.13.3
- Again removed (renamed) the installer_files folder as per TheLounger's earlier suggestion.
- Ran the startup script
This time it successfully recreated the full installer_files folder and not only was able to run but I could now load GGUF models and have them run. The output feels kinda slow compared to before, but I can play with tweaking things from here.
Had the same problem on linux like two weeks ago when I updated from 2.x to 3.3, and TheLounger's suggestion resolved it. No additional cuda or python updates were needed on my side.
You should update the webui and
pip install -rthe newrequirements\full\requirements.txtinside the venv.
Launch cmd_linux.sh, cmd_macos.sh, or cmd_windows.bat, to open the vnev.
I found the issue (at least a solution that worked for me on Windows). You are likely using the incorrect version of Python. The llama_cpp_binaries package requires Python 3.11. Any other version will not work. I kept finding Oobabooga using different versions of Python even though I had it installed because I was not setting the correct environment path.
For an in depth tutorial on Windows, you can read it on my Google Docs: TUTORIAL
Yes, I had a list of things I did that resolved it too, including updating python from 3.10.13 to 3.13.3. 3.10.x does seem to be an issue for it.
Yes, I had a list of things I did that resolved it too, including updating python from 3.10.13 to 3.13.3. 3.10.x does seem to be an issue for it.
Got it. I just read your posting from earlier. I am using CUDA 12.4 currently, and it seems to be working. I didn't test later versions of Python in fear that it was going to do the same thing again. Pretty much all I did was updated python, updated the path, and then ran a command that downloaded the required / missing files. I didn't have to reinstall or do anything else this way.
I am going to post in my tutorial that Python 3.11 and newer works 👍 Can I link to your GitHub account in the Docs I created? I'd like to credit you for confirming that newer Python versions work. And I can post to your comment in this thread just in case others have to go further.
Yeah I can't say for sure if all those were needed. But they were what I did and then it worked. And sure go right ahead!
Yeah I can't say for sure if all those were needed. But they were what I did and then it worked. And sure go right ahead!
Nothing wrong with your method—better safe than sorry! I linked your account at the top of my doc and also linked directly to your comment in this thread at the bottom @Sqrlly 👍
Friends, I also encountered the same problem. I deployed the WebUI on Nvidia AGX Orin ARM64, and also encountered the "No module named 'llama_cpp_binaries'" error. It bothers me. The authorities haven't provided the necessary wheels. I heard that it only supports Python 3.11, but mine is Python 3.10 on Linux Jetpack 6.0. The official provides dependencies such as PyTorch that are only compatible with Python 3.10 for ARM64. I don't know how to proceed. Switching the Python version to 3.11 is quite troublesome, and the llama-cpp-binaries only support Python 3.11.