ComfyUI CUDA error on ZLUDA: CUBLAS_STATUS_NOT_SUPPORTED when calling 'cublasSgemm()'

Expected Behavior

Normally, when I use CUDA on ZLUDA, the prompt should be executed: I am using an AMD Radeon Vega 8 Graphics GPU with the AMD Ryzen 5 3500U CPU. It should happen normally... if it weren't for...

Actual Behavior

...this. FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE] got prompt model_type EPS Using split attention in VAE Using split attention in VAE loaded straight to GPU Requested to load BaseModel Loading 1 new model Requested to load SD1ClipModel Loading 1 new model !!! Exception during processing!!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)Traceback (most recent call last): File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "C:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 58, in encode output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 115, in encode_from_tokens o = self.cond_stage_model.encode_token_weights(tokens) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 567, in encode_token_weights out = getattr(self, self.clip).encode_token_weights(token_weight_pairs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights o = self.encode(to_encode) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 228, in encode return self(tokens) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 200, in forward outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 134, in forward x = self.text_model(*args, **kwargs) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 109, in forward x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 68, in forward x = l(x, mask, optimized_attention) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 49, in forward x += self.self_attn(self.layer_norm1(x), mask, optimized_attention) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 16, in forward q = self.q_proj(x) File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 50, in forward return self.forward_comfy_cast_weights(*args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 46, in forward_comfy_cast_weights return torch.nn.functional.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when callingcublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``

This is a CUDA error, indicating that the cuBLAS status is not supported, so why is this happening?

I am using Python 3.10.11, with PyTorch 2.0.0+cu118 and ZLUDA. And yes, I did apply the --disable-all-custom-nodes flag, to no avail.

Steps to Reproduce

It is heavily assumed that this issue is on my end only, but here is how it happened: First, select a model, enter the prompts, do some tweaks on the settings, and click on 'Queue Prompt'. Wait for a few seconds, and the error occurs.

Debug Logs

FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt
model_type EPS
Using split attention in VAE
Using split attention in VAE
loaded straight to GPU
Requested to load BaseModel
Loading 1 new model
Requested to load SD1ClipModel
Loading 1 new model
!!! Exception during processing!!! CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Traceback (most recent call last):
  File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "C:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "C:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 58, in encode
    output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 115, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 567, in encode_token_weights
    out = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 41, in encode_token_weights
    o = self.encode(to_encode)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 228, in encode
    return self(tokens)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\sd1_clip.py", line 200, in forward
    outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 134, in forward
    x = self.text_model(*args, **kwargs)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 109, in forward
    x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 68, in forward
    x = l(x, mask, optimized_attention)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 49, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 16, in forward
    q = self.q_proj(x)
  File "C:\Users\taren\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 50, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
  File "C:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 46, in forward_comfy_cast_weights
    return torch.nn.functional.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Other

No response

Jul 28 '24 13:07 avachon100510

Which version of CUDA do you have?

import torch
print(torch.version.cuda)

Jul 29 '24 23:07 robinjhuang

Okay nvm, you said you are using Cuda 11.8. Did you change anything about your set up recently?

Try reinstalling pytorch. If you want to use pytorch 2.0.0, try this:

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

Otherwise, try upgrading your pytorch to the latest stable.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Jul 29 '24 23:07 robinjhuang

Try to set the env variable: DISABLE_ADDMM_CUDA_LT=1

Jul 31 '24 12:07 yoopyman

这两天我尝试了不同作者的“segment anything”节点，但是无一列外在复杂一点的工作流中一定会出现“torch.cuda.OutOfMemoryError: Allocation on device”报错，如果只是单独使用这类节点很多时候又是正常的。不知道我遇到的问题是不是和这个错误类似的。 Allocation on device

File "D:\ComfyUI-aki-v1.3\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "D:\ComfyUI-aki-v1.3\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "D:\ComfyUI-aki-v1.3\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 317, in main boxes = groundingdino_predict( File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 182, in groundingdino_predict boxes_filt = get_grounding_output( File "D:\ComfyUI-aki-v1.3\custom_nodes\comfyui_segment_anything\node.py", line 170, in get_grounding_output outputs = model(image[None], captions=[caption]) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\groundingdino.py", line 303, in forward hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 258, in forward memory, memory_text = self.encoder( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 576, in forward output = checkpoint.checkpoint( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_dynamo\eval_frame.py", line 451, in _fn return fn(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch_dynamo\external_utils.py", line 36, in inner return fn(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\utils\checkpoint.py", line 487, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\autograd\function.py", line 598, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\utils\checkpoint.py", line 262, in forward outputs = run_function(*args) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\transformer.py", line 785, in forward src2 = self.self_attn( File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\ms_deform_attn.py", line 271, in forward output = multi_scale_deformable_attn_pytorch( File "D:\ComfyUI-aki-v1.3\custom_nodes\ComfyUI_LayerStyle\py\local_groundingdino\models\GroundingDINO\ms_deform_attn.py", line 70, in multi_scale_deformable_attn_pytorch (torch.stack(sampling_value_list, dim=-2).flatten(-2) * attention_weights)

Aug 02 '24 15:08 Tanglinling

I have the same error. I tried every version of PyTorch and also set the env variable: DISABLE_ADDMM_CUDA_LT=1

Still the same error.

Oct 10 '24 08:10 H68s12

Also got this error - AMD Radeon RX 6700XT

Using split attention in VAE VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16 CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16 Requested to load SD1ClipModel loaded completely 9.5367431640625e+25 235.84423828125 True !!! Exception during processing !!! CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) Traceback (most recent call last): File "L:\ComfyUI-Zluda\execution.py", line 327, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "L:\ComfyUI-Zluda\execution.py", line 202, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "L:\ComfyUI-Zluda\execution.py", line 174, in _map_node_over_list process_inputs(input_dict, i) File "L:\ComfyUI-Zluda\execution.py", line 163, in process_inputs results.append(getattr(obj, func)(**inputs)) File "L:\ComfyUI-Zluda\nodes.py", line 67, in encode return (clip.encode_from_tokens_scheduled(tokens), ) File "L:\ComfyUI-Zluda\comfy\sd.py", line 148, in encode_from_tokens_scheduled pooled_dict = self.encode_from_tokens(tokens, return_pooled=return_pooled, return_dict=True) File "L:\ComfyUI-Zluda\comfy\sd.py", line 210, in encode_from_tokens o = self.cond_stage_model.encode_token_weights(tokens) File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 635, in encode_token_weights out = getattr(self, self.clip).encode_token_weights(token_weight_pairs) File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 45, in encode_token_weights o = self.encode(to_encode) File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 252, in encode return self(tokens) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\sd1_clip.py", line 224, in forward outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 137, in forward x = self.text_model(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 113, in forward x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 70, in forward x = l(x, mask, optimized_attention) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 51, in forward x += self.self_attn(self.layer_norm1(x), mask, optimized_attention) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\clip_model.py", line 17, in forward q = self.q_proj(x) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "L:\ComfyUI-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\ops.py", line 68, in forward return self.forward_comfy_cast_weights(*args, **kwargs) File "L:\ComfyUI-Zluda\comfy\ops.py", line 64, in forward_comfy_cast_weights return torch.nn.functional.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

Jan 26 '25 09:01 wtvamp

RX 6600, same error. DISABLE_ADDMM_CUDA_LT=1 also doesnt work. Tried multiple versions of torch, same problem.

Feb 14 '25 16:02 Elantrus

7800X3D, 7800XT, 64GB RAM, W10, same problem. Made attempts with many many versions. Basically the same results, the version doesn't seem to be the source of the issue. Sometimes it works for a bit, but it stops working. At least I'm not alone!

Combinations(from->till): torch: 2.3->2.8 cu 118->126 python 3.10->3.13

Mar 21 '25 22:03 nornaro

Did someone got a solution? Got the same error!

Apr 23 '25 02:04 U-DEMS

Anyone figure this out yet? Same error, followed the steps in this thread.

Jun 01 '25 18:06 helladamnleet

Anyone figure this out yet? Same error, followed the steps in this thread.

Repatch your libraries (patchzluda or patchzluda2) if you have updated your Torch installation, or you suspect it's been updated.

Jul 12 '25 10:07 MylesCroft