WanVideoImageClipEncode Unsupported head_dim: 384
I get this error thing.
Not seen this one before, what's the full error?
!!! Exception during processing !!! Unsupported head_dim: 384
Traceback (most recent call last):
File "E:\AIML\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "E:\AIML\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes.py", line 848, in process
y = vae.encode([concatenated], device)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 772, in encode
hidden_state = self.single_encode(video, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 751, in single_encode
x = self.model.encode(video, self.scale)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 534, in encode
out = self.encoder(x[:, :, :1, :, :],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 357, in forward
x = layer(x)
^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 262, in forward
x = F.scaled_dot_product_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\sageattention-2.1.1-py3.11-win-amd64.egg\sageattention\core.py", line 130, in sageattn
return sageattn_qk_int8_pv_fp16_triton(q, k, v, tensor_layout=tensor_layout, is_causal=is_causal, sm_scale=sm_scale, return_lse=return_lse)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\_dynamo\eval_frame.py", line 838, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\sageattention-2.1.1-py3.11-win-amd64.egg\sageattention\core.py", line 242, in sageattn_qk_int8_pv_fp16_triton
raise ValueError(f"Unsupported head_dim: {head_dim_og}")
ValueError: Unsupported head_dim: 384
The same just happened to me. Using sageattention 2.1.1
That node is not supposed to even use sageattention, nothing in my code would make it do so. Something else must be overriding the attention globally, I've seen this happen before with other clip models and the ComfyUI-TRELLIS nodes at least.
My workflow was working before (with no changes), but now gives this error after a restart. I will investigate.
As a temporary workaround, I managed to make it work by replacing the AttentionBlock class in wan_video_vae.py with this:
`class AttentionBlock(nn.Module): def init(self, dim, num_heads=6): super().init() self.dim = dim self.num_heads = num_heads assert dim % num_heads == 0, "dim must be divisible by num_heads" self.head_dim = dim // num_heads # e.g., 384 / 6 = 64
# layers
self.norm = RMS_norm(dim)
self.to_qkv = nn.Conv2d(dim, dim * 3, 1)
self.proj = nn.Conv2d(dim, dim, 1)
nn.init.zeros_(self.proj.weight)
def forward(self, x):
identity = x
b, c, t, h, w = x.size()
x = rearrange(x, 'b c t h w -> (b t) c h w')
x = self.norm(x)
# compute query, key, value
qkv = self.to_qkv(x) # [b * t, dim * 3, h, w]
qkv = qkv.reshape(b * t, self.num_heads, self.dim * 3 // self.num_heads, h * w) # [b * t, num_heads, (dim * 3) / num_heads, h * w]
qkv = qkv.permute(0, 1, 3, 2) # [b * t, num_heads, h * w, (dim * 3) / num_heads]
q, k, v = qkv.chunk(3, dim=-1) # Each: [b * t, num_heads, h * w, head_dim]
# apply attention
x = F.scaled_dot_product_attention(q, k, v) # [b * t, num_heads, h * w, head_dim]
x = x.permute(0, 2, 1, 3).reshape(b * t, self.dim, h, w) # [b * t, dim, h, w]
# output
x = self.proj(x)
x = rearrange(x, '(b t) c h w -> b c t h w', t=t)
return x + identity`
As a temporary workaround, I managed to make it work by replacing the AttentionBlock class in wan_video_vae.py with this:
`class AttentionBlock(nn.Module): def init(self, dim, num_heads=6): super().init() self.dim = dim self.num_heads = num_heads assert dim % num_heads == 0, "dim must be divisible by num_heads" self.head_dim = dim // num_heads # e.g., 384 / 6 = 64
# layers self.norm = RMS_norm(dim) self.to_qkv = nn.Conv2d(dim, dim * 3, 1) self.proj = nn.Conv2d(dim, dim, 1) nn.init.zeros_(self.proj.weight) def forward(self, x): identity = x b, c, t, h, w = x.size() x = rearrange(x, 'b c t h w -> (b t) c h w') x = self.norm(x) # compute query, key, value qkv = self.to_qkv(x) # [b * t, dim * 3, h, w] qkv = qkv.reshape(b * t, self.num_heads, self.dim * 3 // self.num_heads, h * w) # [b * t, num_heads, (dim * 3) / num_heads, h * w] qkv = qkv.permute(0, 1, 3, 2) # [b * t, num_heads, h * w, (dim * 3) / num_heads] q, k, v = qkv.chunk(3, dim=-1) # Each: [b * t, num_heads, h * w, head_dim] # apply attention x = F.scaled_dot_product_attention(q, k, v) # [b * t, num_heads, h * w, head_dim] x = x.permute(0, 2, 1, 3).reshape(b * t, self.dim, h, w) # [b * t, dim, h, w] # output x = self.proj(x) x = rearrange(x, '(b t) c h w -> b c t h w', t=t) return x + identity`
I don't think that's a good idea, something else is changing F.scaled_dot_product_attention to sageattention in your enviroment, VAE should not be ran on sageattention even if you force it to work.
I agree with you. It's just a cludge to make it work for now as I don't know how to fix the problem.
I don't think that's a good idea, something else is changing
F.scaled_dot_product_attentionto sageattention in your enviroment, VAE should not be ran on sageattention even if you force it to work.
I'm going with this instead for now until we work out what has happened.
# apply attention with native PyTorch backend with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=False): x = F.scaled_dot_product_attention(q, k, v)
That node is not supposed to even use sageattention, nothing in my code would make it do so. Something else must be overriding the attention globally, I've seen this happen before with other clip models and the ComfyUI-TRELLIS nodes at least.
Thank you I was running comfy with python main.py --use-sage-attention . I just ran it normally with python main.py. These are now my settings.
Total VRAM 24575 MB, total RAM 32692 MB
pytorch version: 2.7.0.dev20250307+cu124
xformers version: 0.0.29.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
Using xformers attention
ComfyUI version: 0.3.18
sageattention version: 2.1.1
I have WanVideo TeaCache running with this settings.
This is now my timings
Model type: i2v, num_heads: 40, num_layers: 40
model_type FLOW
Using accelerate to load and assign model weights to device...
Seq len: 9180
Swapping 10 transformer blocks
Initializing block swap: 100%|████████████████████████████████████████████████████████████████████| 40/40 [00:33<00:00, 1.20it/s]
----------------------
Block swap memory summary:
Transformer blocks on cpu: 3852.61MB
Transformer blocks on cuda:0: 11557.82MB
Total memory used by transformer blocks: 15410.43MB
----------------------
Sampling 21 frames at 720x544 with 30 steps
0%| | 0/30 [00:00<?, ?it/s]ptxas info : 11 bytes gmem, 8 bytes cmem[4]
ptxas info : Compiling entry function 'triton_red_fused__to_copy_add_mul_native_layer_norm_1' for 'sm_86' <rest of compile logs..>
10%|█████████▍ | 3/30 [01:13<09:08, 20.30s/it]TeaCache: Initializing TeaCache variables
TeaCache: Initializing TeaCache variables
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:00<00:00, 8.03s/it]
TeaCache skipped: 13 cond steps, 13 uncond steps
Allocated memory: memory=0.060 GB
Max allocated memory: max_memory=13.910 GB
Max reserved memory: max_reserved=14.219 GB
VAE decoding: 100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.38s/it]
torch.Size([3, 21, 544, 720])
tensor(-1.) tensor(1.)
Prompt executed in 340.98 seconds
Is this how its intended to be run?
This issue eventually went away for me when I deleted ComfyUI-WanVideoWrapper and reinstalled it from scratch.
I tried a fresh git clone but am still experiencing this error on vision clip models loading? may be a comfyui global attn type thing as mentioned above? this does work if I uninstall sageattention ...
`# ComfyUI Error Report
Error Details
- Node ID: 51
- Node Type: CLIPVisionEncode
- Exception Type: AssertionError
- Exception Message: headdim should be in [64, 96, 128].
Stack Trace
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1029, in encode
output = clip_vision.encode_image(image, crop=crop_image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_vision.py", line 70, in encode_image
out = self.model(pixel_values=pixel_values, intermediate_output=-2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 238, in forward
x = self.vision_model(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 206, in forward
x, i = self.encoder(x, mask=None, intermediate_output=intermediate_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 70, in forward
x = l(x, mask, optimized_attention)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 51, in forward
x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 21, in forward
out = optimized_attention(q, k, v, self.heads, mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 448, in attention_pytorch
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\sageattention\core.py", line 82, in sageattn
assert headdim in [64, 96, 128], "headdim should be in [64, 96, 128]."
^^^^^^^^^^^^^^^^^^^^^^^^
System Information
- ComfyUI Version: 0.3.27
- Arguments: ComfyUI\main.py --windows-standalone-build
- OS: nt
- Python Version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
- Embedded Python: true
- PyTorch Version: 2.6.0+cu126
Devices
- Name: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
- Type: cuda
- VRAM Total: 17170956288
- VRAM Free: 14458708992
- Torch VRAM Total: 1308622848
- Torch VRAM Free: 8283136
Logs
2025-04-11T11:50:52.815600 - pytorch version: 2.6.0+cu126
2025-04-11T11:50:54.045672 - xformers version: 0.0.29.post3
2025-04-11T11:50:54.045672 - Set vram state to: NORMAL_VRAM
2025-04-11T11:50:54.045672 - Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
2025-04-11T11:50:54.225618 - Using xformers attention
2025-04-11T11:50:54.962827 - ComfyUI version: 0.3.27
2025-04-11T11:50:54.978827 - ComfyUI frontend version: 1.14.6`
sadly there is no KJ nodes to patch sageattention on clip vision models... ?
`# ComfyUI Error Report
Error Details
- Node ID: 50
- Node Type: WanVideoClipVisionEncode
- Exception Type: AssertionError
- Exception Message: headdim should be in [64, 96, 128].
Stack Trace
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes.py", line 1494, in process
clip_embeds = clip_vision.visual(pixel_values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 461, in visual
out = self.model.visual(image, interpolation=interpolation, use_31_block=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 269, in forward
x = self.transformer[:-1](x)
^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\container.py", line 250, in forward
input = module(input)
^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 152, in forward
x = x + self.attn(self.norm1(x))
^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 86, in forward
x = attention(q, k, v, dropout_p=p, causal=self.causal, attention_mode="sdpa")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\attention.py", line 191, in attention
out = torch.nn.functional.scaled_dot_product_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\sageattention\core.py", line 82, in sageattn
assert headdim in [64, 96, 128], "headdim should be in [64, 96, 128]."
^^^^^^^^^^^^^^^^^^^^^^^^
System Information
- ComfyUI Version: 0.3.27
- Arguments: ComfyUI\main.py --windows-standalone-build
- OS: nt
- Python Version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
- Embedded Python: true
- PyTorch Version: 2.6.0+cu126
Devices
- Name: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
- Type: cuda`
If I install sage-attention 1.x then I get the error "headdim should be in [64, 96, 128]." I decided to update to sage-attention v2 and now get the error ValueError: Unsupported head_dim: 384.
So is there a wan diffusion model that has a smaller head dim that works with sage attention?
If I install sage-attention 1.x then I get the error "headdim should be in [64, 96, 128]." I decided to update to sage-attention v2 and now get the error ValueError: Unsupported head_dim: 384.
So is there a wan diffusion model that has a smaller head dim that works with sage attention?
That error is not from the Wan model, most likely you have some other custom node (TRELLIS nodes are known to do this) overwriting torch attention globally, it detects sage is installed and replaces sdpa with it, causing issues with unsupported models like the clip model.
If I install sage-attention 1.x then I get the error "headdim should be in [64, 96, 128]." I decided to update to sage-attention v2 and now get the error ValueError: Unsupported head_dim: 384.
So is there a wan diffusion model that has a smaller head dim that works with sage attention?
I had installed Hi3dGen, as well, which was causing the same type /similar errors with overridden global attention. Removing Hi3dGen from custom_nodes fixed all of my errors. But will look to remove trellis et al. and move those to a separate comfyui install folder.
Cheers!
I tried a fresh git clone but am still experiencing this error on vision clip models loading? may be a comfyui global attn type thing as mentioned above? this does work if I uninstall sageattention ...
`# ComfyUI Error Report
Error Details
- Node ID: 51
- Node Type: CLIPVisionEncode
- Exception Type: AssertionError
- Exception Message: headdim should be in [64, 96, 128].
Stack Trace
File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list process_inputs(input_dict, i) File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs results.append(getattr(obj, func)(**inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1029, in encode output = clip_vision.encode_image(image, crop=crop_image) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_vision.py", line 70, in encode_image out = self.model(pixel_values=pixel_values, intermediate_output=-2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 238, in forward x = self.vision_model(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 206, in forward x, i = self.encoder(x, mask=None, intermediate_output=intermediate_output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 70, in forward x = l(x, mask, optimized_attention) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 51, in forward x += self.self_attn(self.layer_norm1(x), mask, optimized_attention) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 21, in forward out = optimized_attention(q, k, v, self.heads, mask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 448, in attention_pytorch out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\sageattention\core.py", line 82, in sageattn assert headdim in [64, 96, 128], "headdim should be in [64, 96, 128]." ^^^^^^^^^^^^^^^^^^^^^^^^System Information
- ComfyUI Version: 0.3.27
- Arguments: ComfyUI\main.py --windows-standalone-build
- OS: nt
- Python Version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
- Embedded Python: true
- PyTorch Version: 2.6.0+cu126
Devices
Name: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
- Type: cuda
- VRAM Total: 17170956288
- VRAM Free: 14458708992
- Torch VRAM Total: 1308622848
- Torch VRAM Free: 8283136
Logs
2025-04-11T11:50:52.815600 - pytorch version: 2.6.0+cu126 2025-04-11T11:50:54.045672 - xformers version: 0.0.29.post3 2025-04-11T11:50:54.045672 - Set vram state to: NORMAL_VRAM 2025-04-11T11:50:54.045672 - Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync 2025-04-11T11:50:54.225618 - Using xformers attention 2025-04-11T11:50:54.962827 - ComfyUI version: 0.3.27 2025-04-11T11:50:54.978827 - ComfyUI frontend version: 1.14.6` sadly there is no KJ nodes to patch sageattention on clip vision models... ?
Removing Hi3dGen from custom_nodes solved my issues for the above... Check all custom nodes not installed via manager and ensure latest updates are pulled and everything works as expected, now. Thanks!
分享一下 ComfyUI-IF_Trellis 這個節點也會造成同樣的問題
I also add to this, "ComfyUI-3D-Pack" custom nodes also does this same thing/conflict with Sage Attention & errors appear once the workflow reaches Clip Text Encoder, .... etc