text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

GGUF models fail with llama.cpp loader, incorrectly uses llama_cpp_server_loader (ModuleNotFoundError: llama_cpp_binaries)

Open drzhanov opened this issue 7 months ago • 15 comments

Describe the bug

When attempting to load any GGUF model using the explicitly selected llama.cpp loader in the Model tab, the loading process fails. The traceback indicates that the code execution is incorrectly routed through modules/llama_cpp_server.py, which attempts to import the llama_cpp_binaries package. This package is not installed (as llama-cpp-python was installed instead) and is not required for the standard llama.cpp loader, leading to a ModuleNotFoundError. The UI selection for the llama.cpp loader seems to be ignored for GGUF models.

Expected behavior: The llama.cpp loader should successfully load the GGUF model by utilizing the installed llama-cpp-python library directly. It should not attempt to execute code from modules/llama_cpp_server.py or require the llama_cpp_binaries package when the standard llama.cpp loader is selected.

Is there an existing issue for this?

  • [x] I have searched the existing issues

Reproduction

  1. Perform a clean installation of text-generation-webui on Windows using the steps above.
  2. Place any GGUF model file (e.g., mistral-7b-instruct-v0.1.Q4_K_M.gguf) into the models directory.
  3. Start the server using python server.py.
  4. Open the web UI and navigate to the "Model" tab.
  5. Select the GGUF model from the dropdown list.
  6. Select llama.cpp from the "Model loader" dropdown list.
  7. Click the "Load" button.
  8. Observe the error traceback in the console.

Screenshot

Image

Logs

Running on local URL:  http://127.0.0.1:7860

17:26:06-427596 INFO     Loading "mistral-7b-instruct-v0.1.Q4_K_M.gguf"
17:26:06-607221 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "C:\text-generation-webui\modules\ui_model_menu.py", line 162, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
  File "C:\text-generation-webui\modules\models.py", line 43, in load_model
    output = load_func_map[loader](model_name)
  File "C:\text-generation-webui\modules\models.py", line 68, in llama_cpp_server_loader
    from modules.llama_cpp_server import LlamaServer
  File "C:\text-generation-webui\modules\llama_cpp_server.py", line 10, in <module>
    import llama_cpp_binaries
ModuleNotFoundError: No module named 'llama_cpp_binaries'

System Info

OS: Windows 11
Python Version: Python 3.10.6 
WebUI Version: Latest commit from main branch as of 2025-04-26 (installed via git clone and git pull)
Installation: Clean install using venv and pip install -r requirements/full/requirements.txt, followed by pip install llama-cpp-python.

drzhanov avatar Apr 26 '25 15:04 drzhanov

Same problem here, using the dev branch.

I did the following: after downloading the new stuff from the dev branch (using git pull origin dev), I run update_wizard_windows.bat and choose A) Update the web UI. After updating all libraries, everythink was working.

B0rner avatar Apr 28 '25 14:04 B0rner

It should not attempt to execute code from modules/llama_cpp_server.py or require the llama_cpp_binaries package when the standard llama.cpp loader is selected.

Yes it should. llama-cpp-python requirement (and support) has been removed and has been replaced by llama_cpp_binaries. See https://github.com/oobabooga/text-generation-webui/commit/ae54d8faaa556bddf3e0805f4f3d04bb56c9be4b

Not sure why you're installing llama-cpp-python manually. You should update the webui and pip install -r the new requirements\full\requirements.txt inside the venv. Or just delete installer_files and let it install everything it needs from scratch.

TheLounger avatar Apr 29 '25 01:04 TheLounger

Yes it should. llama-cpp-python requirement (and support) has been removed and has been replaced by llama_cpp_binaries. See ae54d8f

In the file llama_cpp_server.py I still see references to llama_cpp_binaries.

Not sure why you're installing llama-cpp-python manually. You should update the webui and pip install -r the new requirements\full\requirements.txt inside the venv. Or just delete installer_files and let it install everything it needs from scratch.

Honestly I already solved my problem by running the portable version. So far so good

drzhanov avatar Apr 29 '25 06:04 drzhanov

Same problem here, using the dev branch.

I did the following: after downloading the new stuff from the dev branch (using git pull origin dev), I run update_wizard_windows.bat and choose A) Update the web UI. After updating all libraries, everythink was working.

I already found the portable version that works well. Thanks!

drzhanov avatar Apr 29 '25 06:04 drzhanov

So I updated to 3.1 on Windows 10. I had it running and when trying to load a model (Mistral-Nemo-Instruct-2407-Q6_K.gguf with llama.cpp) and I get the "ModuleNotFoundError: No module named 'llama_cpp_binaries'" error shown in the original screenshot. I originally had this as an HF (so that I can use Dry, a big help) and worked great before upgrading, but seems that was removed and now trying to load as gguf with the error.

Not sure why you're installing llama-cpp-python manually. You should update the webui and pip install -r the new requirements\full\requirements.txt inside the venv. Or just delete installer_files and let it install everything it needs from scratch.

I followed these instructions. The install -r completed but didn't resolve the error. Removing the "installer_files" folder and starting again did download a lot of things (took most of the day), but now I'm getting this error:

The system cannot find the path specified. Miniconda hook not found.

And I can't get further. Both the start and update script get that error now. I do have this installed in my F:\AI\text-generation-webui\ folder. And I did save a copy of the installer_files folder, if I should need to put that back.

I've been trying to get this update working for two, going on the third day now. Thanks for any help.

Sqrlly avatar Apr 30 '25 11:04 Sqrlly

So I put the old installer_files folder back in place and I'm able to run it again. Re-ran all the web ui and extention updates. Loading the model still gives the error. Here is the specific one I'm getting:

01:29:34-787365 INFO     Loading "Mistral-Nemo-Instruct-2407-Q6_K.gguf"
01:29:35-222345 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "F:\AI\text-generation-webui\modules\ui_model_menu.py", line 174, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
  File "F:\AI\text-generation-webui\modules\models.py", line 43, in load_model
    output = load_func_map[loader](model_name)
  File "F:\AI\text-generation-webui\modules\models.py", line 66, in llama_cpp_server_loader
    from modules.llama_cpp_server import LlamaServer
  File "F:\AI\text-generation-webui\modules\llama_cpp_server.py", line 12, in <module>
    import llama_cpp_binaries
ModuleNotFoundError: No module named 'llama_cpp_binaries'

Is there some reason llama_cpp_binaries isn't being found? Something I need to manually install?

Sqrlly avatar May 01 '25 05:05 Sqrlly

Maybe this is connected. From full\requirements.txt:

https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.8.0/llama_cpp_binaries-0.8.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

In the environment:

python --version Python 3.10.13

I do have Python 3.13.3 on my system, but that's not 3.11 either. Do I need to update the python in the environment?

Edit: I meant to mention release notes do mention this:

Make llama-cpp-binaries wheels compatible with any Python >= 3.7 (useful for manually installing the requirements under requirements/portable/).

But that doesn't match what requirements.txt says above. But portable\requirements.txt is different:

https://github.com/oobabooga/llama-cpp-binaries/releases/download/v0.8.0/llama_cpp_binaries-0.8.0+cu124-py3-none-win_amd64.whl; platform_system == "Windows"

Should the full requirements match? Maybe that's why others mentioned being able to get the portable one working.

Sqrlly avatar May 01 '25 11:05 Sqrlly

Well I didn't get any responses here, but I have finally gotten it working again, only taking most of a week to do an update. In case someone else has this issue here is what I had done.

  • Updated my video drivers, including updating CUDA from 12.1 to 12.9.
  • Updated Python on my system from 3.10.13 to 3.13.3
  • Again removed (renamed) the installer_files folder as per TheLounger's earlier suggestion.
  • Ran the startup script

This time it successfully recreated the full installer_files folder and not only was able to run but I could now load GGUF models and have them run. The output feels kinda slow compared to before, but I can play with tweaking things from here.

Sqrlly avatar May 03 '25 07:05 Sqrlly

Had the same problem on linux like two weeks ago when I updated from 2.x to 3.3, and TheLounger's suggestion resolved it. No additional cuda or python updates were needed on my side.

You should update the webui and pip install -r the new requirements\full\requirements.txt inside the venv.

Launch cmd_linux.sh, cmd_macos.sh, or cmd_windows.bat, to open the vnev.

weirdnumbat avatar May 31 '25 22:05 weirdnumbat

I found the issue (at least a solution that worked for me on Windows). You are likely using the incorrect version of Python. The llama_cpp_binaries package requires Python 3.11. Any other version will not work. I kept finding Oobabooga using different versions of Python even though I had it installed because I was not setting the correct environment path.

For an in depth tutorial on Windows, you can read it on my Google Docs: TUTORIAL

GuldenWolf avatar Sep 12 '25 09:09 GuldenWolf

Yes, I had a list of things I did that resolved it too, including updating python from 3.10.13 to 3.13.3. 3.10.x does seem to be an issue for it.

Sqrlly avatar Sep 12 '25 17:09 Sqrlly

Yes, I had a list of things I did that resolved it too, including updating python from 3.10.13 to 3.13.3. 3.10.x does seem to be an issue for it.

Got it. I just read your posting from earlier. I am using CUDA 12.4 currently, and it seems to be working. I didn't test later versions of Python in fear that it was going to do the same thing again. Pretty much all I did was updated python, updated the path, and then ran a command that downloaded the required / missing files. I didn't have to reinstall or do anything else this way.

I am going to post in my tutorial that Python 3.11 and newer works 👍 Can I link to your GitHub account in the Docs I created? I'd like to credit you for confirming that newer Python versions work. And I can post to your comment in this thread just in case others have to go further.

GuldenWolf avatar Sep 13 '25 06:09 GuldenWolf

Yeah I can't say for sure if all those were needed. But they were what I did and then it worked. And sure go right ahead!

Sqrlly avatar Sep 14 '25 10:09 Sqrlly

Yeah I can't say for sure if all those were needed. But they were what I did and then it worked. And sure go right ahead!

Nothing wrong with your method—better safe than sorry! I linked your account at the top of my doc and also linked directly to your comment in this thread at the bottom @Sqrlly 👍

GuldenWolf avatar Sep 14 '25 11:09 GuldenWolf

Friends, I also encountered the same problem. I deployed the WebUI on Nvidia AGX Orin ARM64, and also encountered the "No module named 'llama_cpp_binaries'" error. It bothers me. The authorities haven't provided the necessary wheels. I heard that it only supports Python 3.11, but mine is Python 3.10 on Linux Jetpack 6.0. The official provides dependencies such as PyTorch that are only compatible with Python 3.10 for ARM64. I don't know how to proceed. Switching the Python version to 3.11 is quite troublesome, and the llama-cpp-binaries only support Python 3.11.

CaoKai1713 avatar Sep 16 '25 03:09 CaoKai1713