text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

ModuleNotFoundError: No module named 'llama_inference_offload'

Open DrewPear309 opened this issue 1 year ago • 12 comments

Howdy Thank you for this wonderful code. I can't get it working though. I get the error at the bottom. I'm running windows 10 with a Ryzen 9 5900X. I chose the CPU option upon install. I downloaded facebook_opt-6.7b through your interface and manually installed Vicuna. I get a similar error choosing either model. I added these arguments to start-webui.bat. --wbits 4 --groupsize Thank you!

Loading vicuna-13b-GPTQ-4bit-128g... Traceback (most recent call last): File "C:\oobabooga-windows\text-generation-webui\server.py", line 290, in shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\oobabooga-windows\text-generation-webui\modules\models.py", line 100, in load_model from modules.GPTQ_loader import load_quantized File "C:\oobabooga-windows\text-generation-webui\modules\GPTQ_loader.py", line 14, in import llama_inference_offload ModuleNotFoundError: No module named 'llama_inference_offload' Press any key to continue . . .

DrewPear309 avatar Apr 07 '23 03:04 DrewPear309

Same here. Running on ubuntu 20.04 with cuda enabled (cuda install cudatoolkit)

rafaeldelrey avatar Apr 07 '23 03:04 rafaeldelrey

pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...

practical-dreamer avatar Apr 07 '23 03:04 practical-dreamer

did you install the dependencies from the requirements.txt file?

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

practical-dreamer avatar Apr 07 '23 03:04 practical-dreamer

did you install the dependencies from the requirements.txt file?

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

I used the install.bat file to install the framework. That is all. Do i need to install the dependencies in the requirements file? It says it is a one click installer, so it didn't occur to look elsewhere. Thank you!

DrewPear309 avatar Apr 07 '23 03:04 DrewPear309

pretty sure you didn't fully setup gptq that "llama_inference_offload" is part of it...

I just used the install.bat can i install/re-install gptq separately? Thank you

DrewPear309 avatar Apr 07 '23 03:04 DrewPear309

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine.

EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

jllllll avatar Apr 07 '23 04:04 jllllll

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine.

EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

Ah! I don't have a repositories folder. Perhaps because it did not download GPTQ as you said. I can choose the GPU option though i do have an AMD GPU. Would it work? I seem to remember the options were, Nvidia or CPU.

DrewPear309 avatar Apr 07 '23 04:04 DrewPear309

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine. EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

Ah! I don't have a repositories folder. Perhaps because it did not download GPTQ as you said. I can choose the GPU option though i do have an AMD GPU. Would it work? I seem to remember the options were, Nvidia or CPU.

Currently, the installer only supports CPU and NVIDIA GPU. AMD support requires ROCm, which does not support Windows.

jllllll avatar Apr 07 '23 04:04 jllllll

GPTQ-quantized models

Are any of the downloadable models not GPTQ-quantized models? Since this is a multi-modal GUI, i imagine there are some models i can run that i could find. Thank you for your help!

DrewPear309 avatar Apr 07 '23 04:04 DrewPear309

GPTQ-quantized models

Are any of the downloadable models not GPTQ-quantized models? Since this is a multi-modal GUI, i imagine there are some models i can run that i could find. Thank you for your help!

Look for models that aren't 4-bit or 8-bit. After downloading a model, you should be able to run them in 8-bit mode using --load-in-8bit. That doesn't require GPTQ and can run on CPU, though it will be slower than running without 8-bit.

jllllll avatar Apr 07 '23 04:04 jllllll

https://github.com/oobabooga/text-generation-webui/issues/879

da3dsoul avatar Apr 07 '23 14:04 da3dsoul

llama_inference_offload isn't part of the requirements. It is a python script in the GPTQ folder. As long as that folder is in \text-generation-webui\repositories then you should be fine.

EDIT: I just saw that you chose the cpu option. I'm pretty sure that GPTQ-quantized models require a gpu to run. Current;y, the install script does not download GPTQ if you choose the cpu option.

@ALL this is the goto method. Also check the docker/Dockerfile, we copied the GPTQ repo under the ./repositories folder

yhyu13 avatar Apr 29 '23 08:04 yhyu13

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Oct 24 '23 23:10 github-actions[bot]