ComfyUI-CogVideoXWrapper icon indicating copy to clipboard operation
ComfyUI-CogVideoXWrapper copied to clipboard

Sizes of tensors must match except in dimension 2. Expected size 60 but got size 12 for tensor number 1 in the list.

Open phr00t opened this issue 1 year ago • 8 comments

!!! Exception during processing !!! Sizes of tensors must match except in dimension 2. Expected size 60 but got size 12 for tensor number 1 in the list.
Traceback (most recent call last):
  File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 317, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 192, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\nodes.py", line 364, in decode
    frames = vae.decode(latents).sample
             ^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1153, in decode
    decoded = self._decode(z).sample
              ^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1112, in _decode
    return self.tiled_decode(z, return_dict=return_dict)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 1229, in tiled_decode
    tile = self.decoder(tile)
           ^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 851, in forward
    hidden_states = self.conv_in(sample)
                    ^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 134, in forward
    inputs = self.fake_context_parallel_forward(inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl_cogvideox.py", line 126, in fake_context_parallel_forward
    inputs = torch.cat(cached_inputs + [inputs], dim=2)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 60 but got size 12 for tensor number 1 in the list.

Prompt executed in 40.46 seconds

image

Looks like this error is related to trying to use the vae_tiling feature

phr00t avatar Aug 31 '24 00:08 phr00t

i meet the same problem,did you resolve it?

Bellzs avatar Sep 02 '24 14:09 Bellzs

Yup, same issue here. With the new comfyui error popup stuff, the following message gets displayed on the error (I'm also using the enable_vae_tiling). CogVideoDecode Sizes of tensors must match except in dimension 2. Expected size 60 but got size 12 for tensor number 1 in the list.

Gerkinfeltser avatar Sep 03 '24 20:09 Gerkinfeltser

same to me when i try to use vae tiling

WingeD123 avatar Sep 18 '24 23:09 WingeD123

The tiled_vae doesn't seem to support every available resolution, it should work with the CogVideoX default, but probably not with all of the resolutinos CogVideoX-Fun allows. I don't know the exact rule yet.

kijai avatar Sep 18 '24 23:09 kijai

The tiled_vae doesn't seem to support every available resolution, it should work with the CogVideoX default, but probably not with all of the resolutinos CogVideoX-Fun allows. I don't know the exact rule yet.

thanks for all your work. I found this issue is because I chose "fastmode" in fp8_transformer and enabled vae tiling. after change to "enable" in fp8_transformer, it works fine with vae tiling. I have always been using default resolution(720x480). my torch is 2.4.1 and i use "--fast" in run.bat

WingeD123 avatar Sep 18 '24 23:09 WingeD123

The tiled_vae doesn't seem to support every available resolution, it should work with the CogVideoX default, but probably not with all of the resolutinos CogVideoX-Fun allows. I don't know the exact rule yet.

thanks for all your work. I found this issue is because I chose "fastmode" in fp8_transformer and enabled vae tiling. after change to "enable" in fp8_transformer, it works fine with vae tiling. I have always been using default resolution(720x480). my torch is 2.4.1 and i use "--fast" in run.bat

Not to get toooooo derailed here, but fp8 always runs far faster for me.. I never got "fastmode" to work "fast", even though I have an RTX 4080. Do I need to pass "--fast" in my ComfyUI arguments or something?

phr00t avatar Sep 19 '24 00:09 phr00t

The tiled_vae doesn't seem to support every available resolution, it should work with the CogVideoX default, but probably not with all of the resolutinos CogVideoX-Fun allows. I don't know the exact rule yet.

thanks for all your work. I found this issue is because I chose "fastmode" in fp8_transformer and enabled vae tiling. after change to "enable" in fp8_transformer, it works fine with vae tiling. I have always been using default resolution(720x480). my torch is 2.4.1 and i use "--fast" in run.bat

Not to get toooooo derailed here, but fp8 always runs far faster for me.. I never got "fastmode" to work "fast", even though I have an RTX 4080. Do I need to pass "--fast" in my ComfyUI arguments or something?

sorry, i know nothing about tech, i can just tell my case. when choose "fastmode" in fp8_transformer in load cogvideo node and enable vae tiling, i got the exact same issue, after change it to "enable", it works

WingeD123 avatar Sep 19 '24 00:09 WingeD123

The tiled_vae doesn't seem to support every available resolution, it should work with the CogVideoX default, but probably not with all of the resolutinos CogVideoX-Fun allows. I don't know the exact rule yet.

thanks for all your work. I found this issue is because I chose "fastmode" in fp8_transformer and enabled vae tiling. after change to "enable" in fp8_transformer, it works fine with vae tiling. I have always been using default resolution(720x480). my torch is 2.4.1 and i use "--fast" in run.bat

Not to get toooooo derailed here, but fp8 always runs far faster for me.. I never got "fastmode" to work "fast", even though I have an RTX 4080. Do I need to pass "--fast" in my ComfyUI arguments or something?

sorry, i know nothing about tech, i can just tell my case. when choose "fastmode" in fp8_transformer in load cogvideo node and enable vae tiling, i got the exact same issue, after change it to "enable", it works

Same as me.After switch the enable option,it works.So strange.

JimWang151 avatar Sep 23 '24 16:09 JimWang151