text-generation-webui
text-generation-webui copied to clipboard
ModuleNotFoundError: No module named 'llama_inference_offload'
Describe the bug
every time i try to select a model this happens
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
i downloaded the zip
Screenshot
ModuleNotFoundError: No module named 'llama_inference_offload'
Logs
Starting the web UI...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary D:\AI\Project Hyacint\Text AI\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.dll...
D:\AI\Project Hyacint\Text AI\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
The following models are available:
1. facebook_opt-6.7b
2. gpt4-x-alpaca-13b-native-4bit-128g
Which one do you want to load? 1-2
1
Loading facebook_opt-6.7b...
Traceback (most recent call last):
File "D:\AI\Project Hyacint\Text AI\oobabooga-windows\text-generation-webui\server.py", line 302, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "D:\AI\Project Hyacint\Text AI\oobabooga-windows\text-generation-webui\modules\models.py", line 100, in load_model
from modules.GPTQ_loader import load_quantized
File "D:\AI\Project Hyacint\Text AI\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 14, in <module>
import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'
Press any key to continue . . .
System Info
i use a laptop with an i7 intel core. OS windows 10
same on amd cpu
Same here on M1 Pro Mac.
EDIT: Just to know if I'm trying to do something possible or not: is there any way as of now to use gpt4-x-alpaca-13b-native-4bit-128g with WebUI on M-series macs? Anyone managed to?
same here with AMD CPU + nvidia GPU
That error message indicates you don't have GPTQ installed. See https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode for info.
It likely won't work for anyone not using an nvidia GPU right now. CPU models might be a better option for non-nvidia-users for the time being.
Same error on a Windows WSL/Ubuntu Setup
The error persists after installing the module with:
python -m pip install llama_cpp_python-0.1.26-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
This file was extracted from artifacts
python -m pip list | grep llama
llama-cpp-python 0.1.26
so llama_inference_offload is not available yet....
dependency added by PR 460
@UrielCh llama-cpp-python is for CPU running, this error message comes from GPTQ which is for GPU running. You're likely trying to load a GPU model by mistake instead of a CPU model (you can recognize a CPU model by the ggml- prefix they usually have).
In a project root: $ mkdir -p repositories $ cd repositories $ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
Try starting server.py again
I think I tried to start the process on my GPU:
python server.py --auto-devices --chat --wbits 4 --groupsize 128
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/uriel/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/uriel/anaconda3/envs/textgen did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/home/uriel/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/lib/wsl/lib: did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/home/uriel/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/uriel/anaconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
The following models are available:
strange setup message:
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...so cuda lib is missing ...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.sobut not it's present ...
1. facebook_opt-6.7b
2. gpt4-x-alpaca-13b-native-4bit-128g
3. vicuna-13b-GPTQ-4bit-128g
Which one do you want to load? 1-3
2
Loading gpt4-x-alpaca-13b-native-4bit-128g...
Could not find the quantized model in .pt or .safetensors format, exiting...
Still one error, but I need to go for now...
I double check all my model File replacing all LFS references by the real files.
even colab giving me this error Which one do you want to load? 1-2
1
Loading facebook_opt-1.3b...
Traceback (most recent call last):
File "/content/text-generation-webui/server.py", line 302, in
It likely won't work for anyone not using an nvidia GPU right now. CPU models might be a better option for non-nvidia-users for the time being.
Is the gpt4-x-alpaca-13b-native-4bit-128g model available for CPU?
In a project root: $ mkdir -p repositories $ cd repositories $ git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
Try starting server.py again
this worked, but now im seeing an issue thats very common here, and i still dont know how to fix it, its the not enough memory issue where 8gb vram gpus cant run some models, im trying to run gpt3 x alpaca 13b 4bit 128g on an rtx 3050 8gb, amd ryzen 5 5600g.
When it ask(s) which model do you want? Of that list, which is CPU compatible? It seems we still must have a specifically designed PC to run this
There's a lot of CPU-compatible models out there, there's a download list for popular CPU models @ https://rentry.org/nur779 (disclaimer: I have no idea who maintains that)
You can recognize that a model is CPU compatible if its files have a ggml- prefix.
There's a lot of CPU-compatible models out there, there's a download list for popular CPU models @ https://rentry.org/nur779 (disclaimer: I have no idea who maintains that) You can recognize that a model is CPU compatible if its files have a
ggml-prefix.
So i found this model for gpt4 x alpaca https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g/tree/main/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g . It has the ggml- prefix does it mean i can use it on cpu ?
Yes, yes that does, that's a CPU model.
I can't fix the problem with oobabooga but for those of you who're trying to use it with CPU i got good new for you guys there's an alternative and it's very simple. it call " Koboldcpp " it's like llamacpp but with Kobold Webui you can have all the feature that oobabooga have to offer if you don't mind learning how to use the Kobold webui.
+ Installation :
- Go to " https://github.com/LostRuins/koboldcpp " you can read the description if you want.
- Scroll down to Usage you will see the blue Download link click on it.
- You can read the description of how to use it and click download the koboldcpp.exe
- Just drag and drop the model or manually search for the ggml model yourself this work for every CPU model. Next wait until it finished loading the model and copy the http://localhost:5001/ and paste it on your browser.
- you can find out more about koboldcpp and how to use it here: https://www.reddit.com/r/LocalLLaMA/comments/12cfnqk/koboldcpp_combining_all_the_various_ggmlcpp_cpu/
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.