text-generation-webui
text-generation-webui copied to clipboard
CPU installation wont work (NameError: name 'quant_cuda' is not defined)
Describe the bug
when i ask something of the any model i'll get NameError: name 'quant_cuda' is not defined
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
ask something on cpu instalation
Screenshot
No response
Logs
Starting the web UI...
Loading the extension "gallery"... Ok.
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
No model is loaded! Select one in the Model tab.
Loading gpt4-x-alpaca-13b-native-4bit-128g...
CUDA extension not installed.
Found the following quantized model: models\gpt4-x-alpaca-13b-native-4bit-128g\gpt-x-alpaca-13b-native-4bit-128g-cuda.pt
Loading model ...
Done.
Loaded the model in 40.36 seconds.
Traceback (most recent call last):
File "D:\Vicuna\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 66, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "D:\Vicuna\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 251, in generate_with_callback
shared.model.generate(**kwargs)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
outputs = self.model(
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "D:\Vicuna\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Vicuna\oobabooga-windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
NameError: name 'quant_cuda' is not defined
Output generated in 0.64 seconds (0.00 tokens/s, 0 tokens, context 33, seed 213405660)
System Info
i7
I think you can't use a 4bit model on CPU.
It's because GPTQ_load uses quant from the GPTQ cuda branch.
quant requires quant_cuda.cpp
GPTQ trition branch doesn't use quant_cuda I don't think the oobabooga textui is set up for that.
I have the exact same problem and I'm on a good GPU. I don't think it is because of your on CPU.
You got to setup quant_cuda zach
stays it cannot find triton 2.0 when I try to install requirements
I started from scratch and ended up in the same spot I did see on install other was an error and the cuda became 0.0.0 I know it needs to be changed to 11.8 but I have no idea how
@Steelman14aUA the model you are using indicates its for CUDA and does not support cpu, use the non cuda model.
The newest version of the text-web ui supports GPTQ triton.
I too have been trying to get CPU generation working without success. I tried cloning the triton repo from oobabooga, but it seems to have been refactored and is now lacking dependencies (specifically the modelutils.py file). I tried using the last commit from the repo that still has this file, and after passing in the --no-warmup_autotune I can get it running without error messages. Now though, when I try generating output, I get nothing returned.
Is there a known "good" commit from the repo that we should be using, or am I missing something else?
I think its better to convert that model to llama CPP ggml and use it on CPU that way with the wrapper.
Yes, you're right. Using a ggml model seems to have worked. Thanks!
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
I am still facing an issue and getting the same error, here is the script I am running: $ python server.py --listen --wbits 4 --model MetaIX_GPT4-X-Alpaca-30B-4bit --gptq-for-llama --pre_layer 30 60 Please let me know if I missed something