text-generation-webui ModuleNotFoundError: No module named 'llama_inference

Howdy Thank you for this wonderful code. I can't get it working though. I get the error at the bottom. I'm running windows 10 with a Ryzen 9 5900X. I chose the CPU option upon install. I downloaded facebook_opt-6.7b through your interface and manually installed Vicuna. I get a similar error choosing either model. I added these arguments to start-webui.bat. --wbits 4 --groupsize Thank you!

Loading vicuna-13b-GPTQ-4bit-128g... Traceback (most recent call last): File "C:\oobabooga-windows\text-generation-webui\server.py", line 290, in shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\oobabooga-windows\text-generation-webui\modules\models.py", line 100, in load_model from modules.GPTQ_loader import load_quantized File "C:\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 14, in import llama_inference_offload ModuleNotFoundError: No module named 'llama_inference_offload' Press any key to continue . . .

Apr 07 '23 03:04 DrewPear309

Same here. Running on ubuntu 20.04 with cuda enabled (cuda install cudatoolkit)

Apr 07 '23 03:04 rafaeldelrey

pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...

Apr 07 '23 03:04 practical-dreamer

did you install the dependencies from the requirements.txt file?

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

Apr 07 '23 03:04 practical-dreamer

did you install the dependencies from the requirements.txt file?

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

I used the install.bat file to install the framework. That is all. Do i need to install the dependencies in the requirements file? It says it is a one click installer, so it didn't occur to look elsewhere. Thank you!

Apr 07 '23 03:04 DrewPear309

pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...

I just used the install.bat can i install/re-install gptq separately? Thank you

Apr 07 '23 03:04 DrewPear309

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine.

EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

Apr 07 '23 04:04 jllllll

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine.

EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

Ah! I don't have a repositories folder. Perhaps because it did not download GPTQ as you said. I can choose the GPU option though i do have an AMD GPU. Would it work? I seem to remember the options were, Nvidia or CPU.

Apr 07 '23 04:04 DrewPear309

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine. EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

Ah! I don't have a repositories folder. Perhaps because it did not download GPTQ as you said. I can choose the GPU option though i do have an AMD GPU. Would it work? I seem to remember the options were, Nvidia or CPU.

Currently, the installer only supports CPU and NVIDIA GPU. AMD support requires ROCm, which does not support Windows.

Apr 07 '23 04:04 jllllll

GPTQ-quantized models

Are any of the downloadable models not GPTQ-quantized models? Since this is a multi-modal GUI, i imagine there are some models i can run that i could find. Thank you for your help!

Apr 07 '23 04:04 DrewPear309

GPTQ-quantized models

Are any of the downloadable models not GPTQ-quantized models? Since this is a multi-modal GUI, i imagine there are some models i can run that i could find. Thank you for your help!

Look for models that aren't 4-bit or 8-bit. After downloading a model, you should be able to run them in 8-bit mode using --load-in-8bit. That doesn't require GPTQ and can run on CPU, though it will be slower than running without 8-bit.

Apr 07 '23 04:04 jllllll

https://github.com/oobabooga/text-generation-webui/issues/879

Apr 07 '23 14:04 da3dsoul

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine.

EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

@ALL this is the goto method. Also check the docker/Dockerfile, we copied the GPTQ repo under the ./repositories folder

Apr 29 '23 08:04 yhyu13

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Oct 24 '23 23:10 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

ModuleNotFoundError: No module named 'llama_inference_offload'

text-generation-webui
text-generation-webui copied to clipboard