CUDA error on ZLUDA: CUBLAS_STATUS_NOT_SUPPORTED when calling 'cublasSgemm()'
Expected Behavior
Normally, when I use CUDA on ZLUDA, the prompt should be executed: I am using an AMD Radeon Vega 8 Graphics GPU with the AMD Ryzen 5 3500U CPU. It should happen normally... if it weren't for...
Actual Behavior
...this.
FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE] got prompt model_type EPS Using split attention in VAE Using split attention in VAE loaded straight to GPU Requested to load BaseModel Loading 1 new model Requested to load SD1ClipModel Loading 1 new model !!! Exception during processing!!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)Traceback (most recent call last): File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 58, in encode output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 115, in encode_from_tokens o = self.cond_stage_model.encode_token_weights(tokens) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 567, in encode_token_weights out = getattr(self, self.clip).encode_token_weights(token_weight_pairs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights o = self.encode(to_encode) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 228, in encode return self(tokens) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 200, in forward outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 134, in forward x = self.text_model(*args, **kwargs) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 109, in forward x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 68, in forward x = l(x, mask, optimized_attention) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 49, in forward x += self.self_attn(self.layer_norm1(x), mask, optimized_attention) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 16, in forward q = self.q_proj(x) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 50, in forward return self.forward_comfy_cast_weights(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 46, in forward_comfy_cast_weights return torch.nn.functional.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when callingcublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``
This is a CUDA error, indicating that the cuBLAS status is not supported, so why is this happening?
I am using Python 3.10.11, with PyTorch 2.0.0+cu118 and ZLUDA. And yes, I did apply the --disable-all-custom-nodes flag, to no avail.
Steps to Reproduce
It is heavily assumed that this issue is on my end only, but here is how it happened: First, select a model, enter the prompts, do some tweaks on the settings, and click on 'Queue Prompt'. Wait for a few seconds, and the error occurs.
Debug Logs
FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt
model_type EPS
Using split attention in VAE
Using split attention in VAE
loaded straight to GPU
Requested to load BaseModel
Loading 1 new model
Requested to load SD1ClipModel
Loading 1 new model
!!! Exception during processing!!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Traceback (most recent call last):
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "C:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 58, in encode
output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 115, in encode_from_tokens
o = self.cond_stage_model.encode_token_weights(tokens)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 567, in encode_token_weights
out = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights
o = self.encode(to_encode)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 228, in encode
return self(tokens)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 200, in forward
outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 134, in forward
x = self.text_model(*args, **kwargs)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 109, in forward
x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 68, in forward
x = l(x, mask, optimized_attention)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 49, in forward
x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 16, in forward
q = self.q_proj(x)
File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 50, in forward
return self.forward_comfy_cast_weights(*args, **kwargs)
File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 46, in forward_comfy_cast_weights
return torch.nn.functional.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Other
No response
Which version of CUDA do you have?
import torch
print(torch.version.cuda)
Okay nvm, you said you are using Cuda 11.8. Did you change anything about your set up recently?
Try reinstalling pytorch. If you want to use pytorch 2.0.0, try this:
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
Otherwise, try upgrading your pytorch to the latest stable.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Try to set the env variable: DISABLE_ADDMM_CUDA_LT=1
这两天我尝试了不同作者的“segment anything”节点,但是无一列外在复杂一点的工作流中一定会出现“torch.cuda.OutOfMemoryError: Allocation on device”报错,如果只是单独使用这类节点很多时候又是正常的。不知道我遇到的问题是不是和这个错误类似的。 Allocation on device
File "D:\ComfyUI-aki-v1.3\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "D:\ComfyUI-aki-v1.3\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "D:\ComfyUI-aki-v1.3\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 317, in main boxes = groundingdino_predict( File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 182, in groundingdino_predict boxes_filt = get_grounding_output( File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 170, in get_grounding_output outputs = model(image[None], captions=[caption]) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\groundingdino.py", line 303, in forward hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 258, in forward memory, memory_text = self.encoder( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 576, in forward output = checkpoint.checkpoint( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_dynamo\eval_frame.py", line 451, in _fn return fn(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_dynamo\external_utils.py", line 36, in inner return fn(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\utils\checkpoint.py", line 487, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\autograd\function.py", line 598, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\utils\checkpoint.py", line 262, in forward outputs = run_function(*args) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 785, in forward src2 = self.self_attn( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\ms_deform_attn.py", line 271, in forward output = multi_scale_deformable_attn_pytorch( File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\ms_deform_attn.py", line 70, in multi_scale_deformable_attn_pytorch (torch.stack(sampling_value_list, dim=-2).flatten(-2) * attention_weights)
I have the same error. I tried every version of PyTorch and also set the env variable: DISABLE_ADDMM_CUDA_LT=1
Still the same error.
Also got this error - AMD Radeon RX 6700XT
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SD1ClipModel
loaded completely 9.5367431640625e+25 235.84423828125 True
!!! Exception during processing !!! CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
Traceback (most recent call last):
File "L:\ComfyUI-Zluda\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "L:\ComfyUI-Zluda\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "L:\ComfyUI-Zluda\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "L:\ComfyUI-Zluda\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "L:\ComfyUI-Zluda\nodes.py", line 67, in encode
return (clip.encode_from_tokens_scheduled(tokens), )
File "L:\ComfyUI-Zluda\comfy\sd.py", line 148, in encode_from_tokens_scheduled
pooled_dict = self.encode_from_tokens(tokens, return_pooled=return_pooled, return_dict=True)
File "L:\ComfyUI-Zluda\comfy\sd.py", line 210, in encode_from_tokens
o = self.cond_stage_model.encode_token_weights(tokens)
File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 635, in encode_token_weights
out = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 45, in encode_token_weights
o = self.encode(to_encode)
File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 252, in encode
return self(tokens)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 224, in forward
outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 137, in forward
x = self.text_model(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 113, in forward
x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 70, in forward
x = l(x, mask, optimized_attention)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 51, in forward
x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 17, in forward
q = self.q_proj(x)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\ops.py", line 68, in forward
return self.forward_comfy_cast_weights(*args, **kwargs)
File "L:\ComfyUI-Zluda\comfy\ops.py", line 64, in forward_comfy_cast_weights
return torch.nn.functional.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
RX 6600, same error. DISABLE_ADDMM_CUDA_LT=1 also doesnt work. Tried multiple versions of torch, same problem.
7800X3D, 7800XT, 64GB RAM, W10, same problem. Made attempts with many many versions. Basically the same results, the version doesn't seem to be the source of the issue. Sometimes it works for a bit, but it stops working. At least I'm not alone!
Combinations(from->till): torch: 2.3->2.8 cu 118->126 python 3.10->3.13
Did someone got a solution? Got the same error!
Anyone figure this out yet? Same error, followed the steps in this thread.
Anyone figure this out yet? Same error, followed the steps in this thread.
Repatch your libraries (patchzluda or patchzluda2) if you have updated your Torch installation, or you suspect it's been updated.