text-generation-webui
text-generation-webui copied to clipboard
Message "CUDA extension not installed" but CUDA 12 is installed on Windows
Describe the bug
I am trying to use the multimodal model wojtab_llava-13b-v0-4bit-128g
on Windows using CUDA.
(further developments of this issue in my comments bellow)
Is there an existing issue for this?
- [X] I have searched the existing issues
- https://github.com/oobabooga/text-generation-webui/issues/1289: this one is similar but the solutions are not working
Reproduction
I used the installer start_windows.bat
.
Then I restart with option --multimodal-pipeline llava-13b
.
I downloaded the model wojtab/llava-13b-v0-4bit-128g
.
Then loaded it with GPTQ-for-LLaMa
using:
wbits=4
groupsize=128
model_type=llama
After clicking the Load button, it loads as if everything is ok, but the AI does not give any responses.
I noted, that in the console output, there is the message:
CUDA extension not installed
twice.
Also, sending requests to the AI result in an error: NameError: name 'quant_cuda' is not defined
.
Screenshot
No response
Logs
23:12:42-780091 INFO Starting Text generation web UI
23:12:42-783600 INFO Loading the extension "multimodal"
23:12:42-787386 INFO Loading the extension "gallery"
23:12:42-999049 INFO LLaVA - Loading CLIP from openai/clip-vit-large-patch14 as torch.float32 on cuda:0...
23:12:44-963047 INFO LLaVA - Loading projector from liuhaotian/LLaVA-13b-delta-v0 as torch.float32 on cuda:0...
23:12:45-158792 INFO LLaVA supporting models loaded, took 2.16 seconds
23:12:45-160295 INFO Multimodal: loaded pipeline llava-13b from pipelines/llava (LLaVA_v0_13B_Pipeline)
Running on local URL: http://127.0.0.1:7860
23:13:10-235869 INFO Loading "wojtab_llava-13b-v0-4bit-128g"
CUDA extension not installed.
CUDA extension not installed.
23:13:10-264514 INFO Found the following quantized model: models\wojtab_llava-13b-v0-4bit-128g\llava-13b-v0-4bit-128g.safetensors
C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\modeling_utils.py:4193: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
23:13:21-655309 INFO LOADER: "GPTQ-for-LLaMa"
23:13:21-658307 INFO TRUNCATION LENGTH: 2048
23:13:21-659305 INFO INSTRUCTION TEMPLATE: "LLaVA"
23:13:21-660305 INFO Loaded the model in 11.42 seconds.
Traceback (most recent call last):
File "C:\Tools\text-generation-webui-oobabooga\modules\callbacks.py", line 61, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 397, in generate_with_callback
shared.model.generate(**kwargs)
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 1592, in generate
return self.sample(
^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 2696, in sample
outputs = self(
^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1176, in forward
outputs = self.model(
^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1019, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 740, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 639, in forward
query_states = self.q_proj(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gptq_for_llama\gptq_old\quant.py", line 426, in forward
quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
^^^^^^^^^^
NameError: name 'quant_cuda' is not defined
Output generated in 0.59 seconds (0.00 tokens/s, 0 tokens, context 71, seed 226787340)
System Info
GPU: RTX 3060, 6GB VRAM
CPU: i7 11th gen, 64GB RAM
OS: Windows 11
CUDA: 12
What I have found so far is an import
that is failing in the quant.py
file:
import quant_cuda_faster as quant_cuda
I then created a test file containing the import
to debug.
First, I got the error:
Exception has occurred: ImportError
DLL load failed while importing quant_cuda_faster: The specified module could not be found.
Then I found out that to load a module DLL, the file needs to be in the trusted DLL list.
I proceeded to add env\Lib\site-packages\torch\lib
in the list since I found out that the code of quant_cuda_faster depends on torch.
import os
os.add_dll_directory(r"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib")
import quant_cuda_faster as quant_cuda
The error now changes to:
Exception has occurred: ImportError
DLL load failed while importing quant_cuda_faster: The specified procedure could not be found.
I have no idea on how to fix this, since the error does not give any indication of which procedure it might be missing.
I can only suppose that it is a torch version conflict, where the quant_cuda_faster
expects a version of torch that is not the same one that is installed.
I have the same issue here
I executed python -m torch.utils.collect_env
in the context of cmd_windows.bat
, with the result bellow.
It tells that CUDA is installed.
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home Single Language
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-Builds project) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 551.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2304
DeviceID=CPU0
Family=198
L2CacheSize=10240
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2304
Name=11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.1+cu121
[pip3] torchaudio==2.2.1+cu121
[pip3] torchvision==0.17.1+cu121
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.2.1+cu121 pypi_0 pypi
[conda] torchaudio 2.2.1+cu121 pypi_0 pypi
[conda] torchvision 0.17.1+cu121 pypi_0 pypi
Hello, I was able to resolve this by using "ExLlamav2_HF" as the loader instead of "GPTQ-for-LLaMa", make sure to click Save Settings so it uses that next time it launches.
Hello, I was able to resolve this by using "ExLlamav2_HF" as the loader instead of "GPTQ-for-LLaMa", make sure to click Save Settings so it uses that next time it launches.
@HamedEmine Yeah, I could also load this model using ExLlamav2_HF
. Thanks for pointing me to this solution. It can answer text questions, but unfortunately it raised an error when I tried to send it a picture. My intention is to use the multimodal extension. I was following the instructions of the multimodal extension page, that is why I was trying to use the wojtab_llava-13b-v0-4bit-128g
model.
This is the error when I try to input images (AttributeError: 'Exllamav2HF' object has no attribute 'model'):
Traceback (most recent call last):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\queueing.py", line 501, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\route_utils.py", line 258, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1684, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\blocks.py", line 1262, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 574, in async_iteration
return await iterator.__anext__()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 567, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 550, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\gradio\utils.py", line 733, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 414, in generate_chat_reply_wrapper
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 382, in generate_chat_reply
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\chat.py", line 325, in chatbot_wrapper
for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 33, in generate_reply
for result in _generate_reply(*args, **kwargs):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 85, in _generate_reply
for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\text_generation.py", line 324, in generate_reply_HF
question, input_ids, inputs_embeds = apply_extensions('tokenizer', state, question, input_ids, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\extensions.py", line 231, in apply_extensions
return EXTENSION_MAP[typ](*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\modules\extensions.py", line 134, in _apply_tokenizer_extensions
prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\script.py", line 90, in tokenizer_modifier
prompt, input_ids, input_embeds, total_embedded = multimodal_embedder.forward(prompt, state, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\multimodal_embedder.py", line 172, in forward
prompt_parts = self._embed(prompt_parts)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\multimodal_embedder.py", line 154, in _embed
parts[i].embedding = self.pipeline.embed_tokens(part.input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\extensions\multimodal\pipelines\instructblip-pipeline\instructblip_pipeline.py", line 42, in embed_tokens
return shared.model.model.embed_tokens(input_ids).to(shared.model.device, dtype=shared.model.dtype)
^^^^^^^^^^^^^^^^^^
File "C:\Sync\nb-avell\Tools\text-generation-webui-oobabooga\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Exllamav2HF' object has no attribute 'model'
I was able to load the model and use multimodal extension using the ExLlamav2
loader.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.