ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

WanVideoImageClipEncode Unsupported head_dim: 384

Open festerz opened this issue 9 months ago • 18 comments

I get this error thing.

festerz avatar Mar 04 '25 17:03 festerz

Not seen this one before, what's the full error?

kijai avatar Mar 04 '25 20:03 kijai

!!! Exception during processing !!! Unsupported head_dim: 384
Traceback (most recent call last):
  File "E:\AIML\ComfyUI\execution.py", line 327, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\execution.py", line 202, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\execution.py", line 174, in _map_node_over_list
    process_inputs(input_dict, i)
  File "E:\AIML\ComfyUI\execution.py", line 163, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes.py", line 848, in process
    y = vae.encode([concatenated], device)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 772, in encode
    hidden_state = self.single_encode(video, device)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 751, in single_encode
    x = self.model.encode(video, self.scale)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 534, in encode
    out = self.encoder(x[:, :, :1, :, :],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 357, in forward
    x = layer(x)
        ^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\AIML\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\wan_video_vae.py", line 262, in forward
    x = F.scaled_dot_product_attention(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\sageattention-2.1.1-py3.11-win-amd64.egg\sageattention\core.py", line 130, in sageattn
    return sageattn_qk_int8_pv_fp16_triton(q, k, v, tensor_layout=tensor_layout, is_causal=is_causal, sm_scale=sm_scale, return_lse=return_lse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\torch\_dynamo\eval_frame.py", line 838, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Geocine\miniconda3\envs\comfy\Lib\site-packages\sageattention-2.1.1-py3.11-win-amd64.egg\sageattention\core.py", line 242, in sageattn_qk_int8_pv_fp16_triton
    raise ValueError(f"Unsupported head_dim: {head_dim_og}")
ValueError: Unsupported head_dim: 384

geocine avatar Mar 08 '25 09:03 geocine

The same just happened to me. Using sageattention 2.1.1

pmonck avatar Mar 08 '25 23:03 pmonck

That node is not supposed to even use sageattention, nothing in my code would make it do so. Something else must be overriding the attention globally, I've seen this happen before with other clip models and the ComfyUI-TRELLIS nodes at least.

kijai avatar Mar 08 '25 23:03 kijai

My workflow was working before (with no changes), but now gives this error after a restart. I will investigate.

pmonck avatar Mar 08 '25 23:03 pmonck

As a temporary workaround, I managed to make it work by replacing the AttentionBlock class in wan_video_vae.py with this:

`class AttentionBlock(nn.Module): def init(self, dim, num_heads=6): super().init() self.dim = dim self.num_heads = num_heads assert dim % num_heads == 0, "dim must be divisible by num_heads" self.head_dim = dim // num_heads # e.g., 384 / 6 = 64

    # layers
    self.norm = RMS_norm(dim)
    self.to_qkv = nn.Conv2d(dim, dim * 3, 1)
    self.proj = nn.Conv2d(dim, dim, 1)

    nn.init.zeros_(self.proj.weight)

def forward(self, x):
    identity = x
    b, c, t, h, w = x.size()
    x = rearrange(x, 'b c t h w -> (b t) c h w')
    x = self.norm(x)
    # compute query, key, value
    qkv = self.to_qkv(x)  # [b * t, dim * 3, h, w]
    qkv = qkv.reshape(b * t, self.num_heads, self.dim * 3 // self.num_heads, h * w)  # [b * t, num_heads, (dim * 3) / num_heads, h * w]
    qkv = qkv.permute(0, 1, 3, 2)  # [b * t, num_heads, h * w, (dim * 3) / num_heads]
    q, k, v = qkv.chunk(3, dim=-1)  # Each: [b * t, num_heads, h * w, head_dim]

    # apply attention
    x = F.scaled_dot_product_attention(q, k, v)  # [b * t, num_heads, h * w, head_dim]
    x = x.permute(0, 2, 1, 3).reshape(b * t, self.dim, h, w)  # [b * t, dim, h, w]

    # output
    x = self.proj(x)
    x = rearrange(x, '(b t) c h w -> b c t h w', t=t)
    return x + identity`

pmonck avatar Mar 09 '25 00:03 pmonck

As a temporary workaround, I managed to make it work by replacing the AttentionBlock class in wan_video_vae.py with this:

`class AttentionBlock(nn.Module): def init(self, dim, num_heads=6): super().init() self.dim = dim self.num_heads = num_heads assert dim % num_heads == 0, "dim must be divisible by num_heads" self.head_dim = dim // num_heads # e.g., 384 / 6 = 64

    # layers
    self.norm = RMS_norm(dim)
    self.to_qkv = nn.Conv2d(dim, dim * 3, 1)
    self.proj = nn.Conv2d(dim, dim, 1)

    nn.init.zeros_(self.proj.weight)

def forward(self, x):
    identity = x
    b, c, t, h, w = x.size()
    x = rearrange(x, 'b c t h w -> (b t) c h w')
    x = self.norm(x)
    # compute query, key, value
    qkv = self.to_qkv(x)  # [b * t, dim * 3, h, w]
    qkv = qkv.reshape(b * t, self.num_heads, self.dim * 3 // self.num_heads, h * w)  # [b * t, num_heads, (dim * 3) / num_heads, h * w]
    qkv = qkv.permute(0, 1, 3, 2)  # [b * t, num_heads, h * w, (dim * 3) / num_heads]
    q, k, v = qkv.chunk(3, dim=-1)  # Each: [b * t, num_heads, h * w, head_dim]

    # apply attention
    x = F.scaled_dot_product_attention(q, k, v)  # [b * t, num_heads, h * w, head_dim]
    x = x.permute(0, 2, 1, 3).reshape(b * t, self.dim, h, w)  # [b * t, dim, h, w]

    # output
    x = self.proj(x)
    x = rearrange(x, '(b t) c h w -> b c t h w', t=t)
    return x + identity`

I don't think that's a good idea, something else is changing F.scaled_dot_product_attention to sageattention in your enviroment, VAE should not be ran on sageattention even if you force it to work.

kijai avatar Mar 09 '25 00:03 kijai

I agree with you. It's just a cludge to make it work for now as I don't know how to fix the problem.

pmonck avatar Mar 09 '25 00:03 pmonck

I don't think that's a good idea, something else is changing F.scaled_dot_product_attention to sageattention in your enviroment, VAE should not be ran on sageattention even if you force it to work.

I'm going with this instead for now until we work out what has happened.

# apply attention with native PyTorch backend with torch.backends.cuda.sdp_kernel(enable_flash=False, enable_math=True, enable_mem_efficient=False): x = F.scaled_dot_product_attention(q, k, v)

pmonck avatar Mar 09 '25 01:03 pmonck

That node is not supposed to even use sageattention, nothing in my code would make it do so. Something else must be overriding the attention globally, I've seen this happen before with other clip models and the ComfyUI-TRELLIS nodes at least.

Thank you I was running comfy with python main.py --use-sage-attention . I just ran it normally with python main.py. These are now my settings.

Total VRAM 24575 MB, total RAM 32692 MB
pytorch version: 2.7.0.dev20250307+cu124
xformers version: 0.0.29.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : cudaMallocAsync
Using xformers attention
ComfyUI version: 0.3.18
sageattention version: 2.1.1

Image

I have WanVideo TeaCache running with this settings.

Image

This is now my timings

Model type: i2v, num_heads: 40, num_layers: 40
model_type FLOW
Using accelerate to load and assign model weights to device...
Seq len: 9180
Swapping 10 transformer blocks
Initializing block swap: 100%|████████████████████████████████████████████████████████████████████| 40/40 [00:33<00:00,  1.20it/s]
----------------------
Block swap memory summary:
Transformer blocks on cpu: 3852.61MB
Transformer blocks on cuda:0: 11557.82MB
Total memory used by transformer blocks: 15410.43MB
----------------------
Sampling 21 frames at 720x544 with 30 steps
  0%|                                                                                                      | 0/30 [00:00<?, ?it/s]ptxas info    : 11 bytes gmem, 8 bytes cmem[4]
ptxas info    : Compiling entry function 'triton_red_fused__to_copy_add_mul_native_layer_norm_1' for 'sm_86' <rest of compile logs..>
 10%|█████████▍                                                                                    | 3/30 [01:13<09:08, 20.30s/it]TeaCache: Initializing TeaCache variables
TeaCache: Initializing TeaCache variables
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:00<00:00,  8.03s/it]
TeaCache skipped: 13 cond steps, 13 uncond steps
Allocated memory: memory=0.060 GB
Max allocated memory: max_memory=13.910 GB
Max reserved memory: max_reserved=14.219 GB
VAE decoding: 100%|█████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.38s/it]
torch.Size([3, 21, 544, 720])
tensor(-1.) tensor(1.)
Prompt executed in 340.98 seconds

Is this how its intended to be run?

geocine avatar Mar 09 '25 07:03 geocine

This issue eventually went away for me when I deleted ComfyUI-WanVideoWrapper and reinstalled it from scratch.

pmonck avatar Mar 15 '25 13:03 pmonck

I tried a fresh git clone but am still experiencing this error on vision clip models loading? may be a comfyui global attn type thing as mentioned above? this does work if I uninstall sageattention ...

`# ComfyUI Error Report

Error Details

  • Node ID: 51
  • Node Type: CLIPVisionEncode
  • Exception Type: AssertionError
  • Exception Message: headdim should be in [64, 96, 128].

Stack Trace

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list
    process_inputs(input_dict, i)

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1029, in encode
    output = clip_vision.encode_image(image, crop=crop_image)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_vision.py", line 70, in encode_image
    out = self.model(pixel_values=pixel_values, intermediate_output=-2)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 238, in forward
    x = self.vision_model(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 206, in forward
    x, i = self.encoder(x, mask=None, intermediate_output=intermediate_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 70, in forward
    x = l(x, mask, optimized_attention)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 51, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 21, in forward
    out = optimized_attention(q, k, v, self.heads, mask)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 448, in attention_pytorch
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\sageattention\core.py", line 82, in sageattn
    assert headdim in [64, 96, 128], "headdim should be in [64, 96, 128]."
           ^^^^^^^^^^^^^^^^^^^^^^^^

System Information

  • ComfyUI Version: 0.3.27
  • Arguments: ComfyUI\main.py --windows-standalone-build
  • OS: nt
  • Python Version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
  • Embedded Python: true
  • PyTorch Version: 2.6.0+cu126

Devices

  • Name: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
    • Type: cuda
    • VRAM Total: 17170956288
    • VRAM Free: 14458708992
    • Torch VRAM Total: 1308622848
    • Torch VRAM Free: 8283136

Logs

2025-04-11T11:50:52.815600 - pytorch version: 2.6.0+cu126
2025-04-11T11:50:54.045672 - xformers version: 0.0.29.post3
2025-04-11T11:50:54.045672 - Set vram state to: NORMAL_VRAM
2025-04-11T11:50:54.045672 - Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
2025-04-11T11:50:54.225618 - Using xformers attention
2025-04-11T11:50:54.962827 - ComfyUI version: 0.3.27
2025-04-11T11:50:54.978827 - ComfyUI frontend version: 1.14.6`

sadly there is no KJ nodes to patch sageattention on clip vision models... ? 

appsmalthouse avatar Apr 11 '25 16:04 appsmalthouse

`# ComfyUI Error Report

Error Details

  • Node ID: 50
  • Node Type: WanVideoClipVisionEncode
  • Exception Type: AssertionError
  • Exception Message: headdim should be in [64, 96, 128].

Stack Trace

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list
    process_inputs(input_dict, i)

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes.py", line 1494, in process
    clip_embeds = clip_vision.visual(pixel_values)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 461, in visual
    out = self.model.visual(image, interpolation=interpolation, use_31_block=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 269, in forward
    x = self.transformer[:-1](x)
        ^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\container.py", line 250, in forward
    input = module(input)
            ^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 152, in forward
    x = x + self.attn(self.norm1(x))
            ^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\clip.py", line 86, in forward
    x = attention(q, k, v, dropout_p=p, causal=self.causal, attention_mode="sdpa")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\attention.py", line 191, in attention
    out = torch.nn.functional.scaled_dot_product_attention(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\sageattention\core.py", line 82, in sageattn
    assert headdim in [64, 96, 128], "headdim should be in [64, 96, 128]."
           ^^^^^^^^^^^^^^^^^^^^^^^^

System Information

  • ComfyUI Version: 0.3.27
  • Arguments: ComfyUI\main.py --windows-standalone-build
  • OS: nt
  • Python Version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
  • Embedded Python: true
  • PyTorch Version: 2.6.0+cu126

Devices

  • Name: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
    • Type: cuda`

appsmalthouse avatar Apr 11 '25 16:04 appsmalthouse

If I install sage-attention 1.x then I get the error "headdim should be in [64, 96, 128]." I decided to update to sage-attention v2 and now get the error ValueError: Unsupported head_dim: 384.

So is there a wan diffusion model that has a smaller head dim that works with sage attention?

saixiong avatar Apr 14 '25 05:04 saixiong

If I install sage-attention 1.x then I get the error "headdim should be in [64, 96, 128]." I decided to update to sage-attention v2 and now get the error ValueError: Unsupported head_dim: 384.

So is there a wan diffusion model that has a smaller head dim that works with sage attention?

That error is not from the Wan model, most likely you have some other custom node (TRELLIS nodes are known to do this) overwriting torch attention globally, it detects sage is installed and replaces sdpa with it, causing issues with unsupported models like the clip model.

kijai avatar Apr 14 '25 07:04 kijai

If I install sage-attention 1.x then I get the error "headdim should be in [64, 96, 128]." I decided to update to sage-attention v2 and now get the error ValueError: Unsupported head_dim: 384.

So is there a wan diffusion model that has a smaller head dim that works with sage attention?

I had installed Hi3dGen, as well, which was causing the same type /similar errors with overridden global attention. Removing Hi3dGen from custom_nodes fixed all of my errors. But will look to remove trellis et al. and move those to a separate comfyui install folder.

Cheers!

appsmalthouse avatar Apr 14 '25 13:04 appsmalthouse

I tried a fresh git clone but am still experiencing this error on vision clip models loading? may be a comfyui global attn type thing as mentioned above? this does work if I uninstall sageattention ...

`# ComfyUI Error Report

Error Details

  • Node ID: 51
  • Node Type: CLIPVisionEncode
  • Exception Type: AssertionError
  • Exception Message: headdim should be in [64, 96, 128].

Stack Trace

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list
    process_inputs(input_dict, i)

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1029, in encode
    output = clip_vision.encode_image(image, crop=crop_image)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_vision.py", line 70, in encode_image
    out = self.model(pixel_values=pixel_values, intermediate_output=-2)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 238, in forward
    x = self.vision_model(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 206, in forward
    x, i = self.encoder(x, mask=None, intermediate_output=intermediate_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 70, in forward
    x = l(x, mask, optimized_attention)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 51, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\clip_model.py", line 21, in forward
    out = optimized_attention(q, k, v, self.heads, mask)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py", line 448, in attention_pytorch
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "E:\StableDiffusion\ComfyOG\ComfyUI_windows_portable_nvidia0_3_26\ComfyUI_windows_portable\python_embeded\Lib\site-packages\sageattention\core.py", line 82, in sageattn
    assert headdim in [64, 96, 128], "headdim should be in [64, 96, 128]."
           ^^^^^^^^^^^^^^^^^^^^^^^^

System Information

  • ComfyUI Version: 0.3.27
  • Arguments: ComfyUI\main.py --windows-standalone-build
  • OS: nt
  • Python Version: 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)]
  • Embedded Python: true
  • PyTorch Version: 2.6.0+cu126

Devices

  • Name: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync

    • Type: cuda
    • VRAM Total: 17170956288
    • VRAM Free: 14458708992
    • Torch VRAM Total: 1308622848
    • Torch VRAM Free: 8283136

Logs

2025-04-11T11:50:52.815600 - pytorch version: 2.6.0+cu126
2025-04-11T11:50:54.045672 - xformers version: 0.0.29.post3
2025-04-11T11:50:54.045672 - Set vram state to: NORMAL_VRAM
2025-04-11T11:50:54.045672 - Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
2025-04-11T11:50:54.225618 - Using xformers attention
2025-04-11T11:50:54.962827 - ComfyUI version: 0.3.27
2025-04-11T11:50:54.978827 - ComfyUI frontend version: 1.14.6`

sadly there is no KJ nodes to patch sageattention on clip vision models... ? 

Removing Hi3dGen from custom_nodes solved my issues for the above... Check all custom nodes not installed via manager and ensure latest updates are pulled and everything works as expected, now. Thanks!

appsmalthouse avatar Apr 14 '25 14:04 appsmalthouse

分享一下 ComfyUI-IF_Trellis 這個節點也會造成同樣的問題

aidec avatar Nov 03 '25 13:11 aidec

I also add to this, "ComfyUI-3D-Pack" custom nodes also does this same thing/conflict with Sage Attention & errors appear once the workflow reaches Clip Text Encoder, .... etc

Mr-Hazem-Eng-Artist avatar Dec 15 '25 09:12 Mr-Hazem-Eng-Artist