text-generation-webui
text-generation-webui copied to clipboard
ModuleNotFoundError: No module named 'llama_inference_offload'
Howdy Thank you for this wonderful code. I can't get it working though. I get the error at the bottom. I'm running windows 10 with a Ryzen 9 5900X. I chose the CPU option upon install. I downloaded facebook_opt-6.7b through your interface and manually installed Vicuna. I get a similar error choosing either model. I added these arguments to start-webui.bat. --wbits 4 --groupsize Thank you!
Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
File "C:\oobabooga-windows\text-generation-webui\server.py", line 290, in
Same here. Running on ubuntu 20.04 with cuda enabled (cuda install cudatoolkit)
pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...
did you install the dependencies from the requirements.txt file?
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode
did you install the dependencies from the requirements.txt file?
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode
I used the install.bat file to install the framework. That is all. Do i need to install the dependencies in the requirements file? It says it is a one click installer, so it didn't occur to look elsewhere. Thank you!
pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...
I just used the install.bat can i install/re-install gptq separately? Thank you
llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in
\text-generation-webui\repositories
then you should be fine.
EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.
llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in
\text-generation-webui\repositories
then you should be fine.EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.
Ah! I don't have a repositories folder. Perhaps because it did not download GPTQ as you said. I can choose the GPU option though i do have an AMD GPU. Would it work? I seem to remember the options were, Nvidia or CPU.
llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in
\text-generation-webui\repositories
then you should be fine. EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.Ah! I don't have a repositories folder. Perhaps because it did not download GPTQ as you said. I can choose the GPU option though i do have an AMD GPU. Would it work? I seem to remember the options were, Nvidia or CPU.
Currently, the installer only supports CPU and NVIDIA GPU. AMD support requires ROCm, which does not support Windows.
GPTQ-quantized models
Are any of the downloadable models not GPTQ-quantized models? Since this is a multi-modal GUI, i imagine there are some models i can run that i could find. Thank you for your help!
GPTQ-quantized models
Are any of the downloadable models not GPTQ-quantized models? Since this is a multi-modal GUI, i imagine there are some models i can run that i could find. Thank you for your help!
Look for models that aren't 4-bit or 8-bit. After downloading a model, you should be able to run them in 8-bit mode using --load-in-8bit
. That doesn't require GPTQ and can run on CPU, though it will be slower than running without 8-bit.
https://github.com/oobabooga/text-generation-webui/issues/879
llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in
\text-generation-webui\repositories
then you should be fine.EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.
@ALL this is the goto method. Also check the docker/Dockerfile, we copied the GPTQ repo under the ./repositories folder
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.