The size of tensor a (45) must match the size of tensor b (44) at non-singleton dimension 4

I run the sample text to video workflow and get this error, I also reduce the prompt length, but it does not work.

Any reason for this ?

Oct 24 '24 16:10 gladmustang

Full error?

Oct 24 '24 16:10 kijai

Below is the full error: Temporal tiling and context schedule disabled 0%| | 0/50 [00:07<?, ?it/s] !!! Exception during processing !!! The size of tensor a (45) must match the size of tensor b (44) at non-singleton dimension 4 Traceback (most recent call last): File "E:\tools\ComfyUI_windows_portable\ComfyUI\execution.py", line 323, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\ComfyUI\execution.py", line 198, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\ComfyUI\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "E:\tools\ComfyUI_windows_portable\ComfyUI\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(**inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\nodes.py", line 1311, in process latents = pipeline["pipe"]( ^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\pipeline_cogvideox.py", line 891, in call latents = self.scheduler.step(noise_pred, t, latents.to(self.vae.dtype), **extra_step_kwargs, return_dict=False)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\schedulers\scheduling_dpmsolver_multistep.py", line 972, in step model_output = self.convert_model_output(model_output, sample=sample) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\tools\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\schedulers\scheduling_dpmsolver_multistep.py", line 569, in convert_model_output x0_pred = alpha_t * sample - sigma_t * model_output ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ RuntimeError: The size of tensor a (45) must match the size of tensor b (44) at non-singleton dimension 4

Oct 24 '24 16:10 gladmustang

I have Similar error here

ComfyUI Error Report

Error Details

Node Type: CogVideoSampler
Exception Type: RuntimeError
Exception Message: The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1

Stack Trace

  File "/workspace/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

  File "/workspace/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

  File "/workspace/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)

  File "/workspace/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))

  File "/workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 1319, in process
    latents = pipeline["pipe"](

  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)

  File "/workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 483, in __call__
    latents, timesteps, noise = self.prepare_latents(

  File "/workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 237, in prepare_latents
    latents = self.scheduler.add_noise(latents, noise, latent_timestep)

  File "/opt/conda/lib/python3.10/site-packages/diffusers/schedulers/scheduling_dpm_cogvideox.py", line 465, in add_noise
    noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise

System Information

ComfyUI Version: v0.2.4
Arguments: main.py --listen --enable-cors-header --disable-cuda-malloc
OS: posix
Python Version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
Embedded Python: false
PyTorch Version: 2.3.1

Devices

Name: cuda:0 NVIDIA RTX A6000 : native
- Type: cuda
- VRAM Total: 51041271808
- VRAM Free: 18605571584
- Torch VRAM Total: 13174308864
- Torch VRAM Free: 69828096

Logs

2024-10-28 09:46:56,741 - root - INFO - Total VRAM 48677 MB, total RAM 58338 MB
2024-10-28 09:46:56,741 - root - INFO - pytorch version: 2.3.1
2024-10-28 09:46:57,898 - root - INFO - xformers version: 0.0.27
2024-10-28 09:46:57,899 - root - INFO - Set vram state to: NORMAL_VRAM
2024-10-28 09:46:57,899 - root - INFO - Device: cuda:0 NVIDIA RTX A6000 : native
2024-10-28 09:46:58,232 - root - INFO - Using xformers cross attention
2024-10-28 09:46:58,694 - root - INFO - [Prompt Server] web root: /workspace/ComfyUI/web
2024-10-28 09:47:41,350 - ComfyUI-CogVideoXWrapper.custom_cogvideox_transformer_3d - INFO - sageattn not found, using sdpa
2024-10-28 09:47:41,385 - ComfyUI-CogVideoXWrapper.cogvideox_fun.transformer_3d - INFO - sageattn not found, using sdpa
2024-10-28 09:47:41,387 - ComfyUI-CogVideoXWrapper.cogvideox_fun.fun_pab_transformer_3d - INFO - sageattn not found, using sdpa
2024-10-28 09:47:41,539 - numexpr.utils - INFO - Note: NumExpr detected 28 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
2024-10-28 09:47:41,539 - numexpr.utils - INFO - NumExpr defaulting to 16 threads.
2024-10-28 09:47:41,708 - root - INFO - Total VRAM 48677 MB, total RAM 58338 MB
2024-10-28 09:47:41,708 - root - INFO - pytorch version: 2.3.1
2024-10-28 09:47:41,708 - root - INFO - xformers version: 0.0.27
2024-10-28 09:47:41,708 - root - INFO - Set vram state to: NORMAL_VRAM
2024-10-28 09:47:41,708 - root - INFO - Device: cuda:0 NVIDIA RTX A6000 : native
2024-10-28 09:47:41,826 - root - INFO - 
Import times for custom nodes:
2024-10-28 09:47:41,826 - root - INFO -    0.0 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-RunComfy-Helper
2024-10-28 09:47:41,828 - root - INFO -    0.0 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-KJNodes
2024-10-28 09:47:41,828 - root - INFO -    0.0 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
2024-10-28 09:47:41,828 - root - INFO -    0.0 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-Manager
2024-10-28 09:47:41,828 - root - INFO -    0.3 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-Crystools
2024-10-28 09:47:41,828 - root - INFO -    0.6 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
2024-10-28 09:47:41,828 - root - INFO -   41.7 seconds: /workspace/ComfyUI/custom_nodes/ComfyUI-Allor
2024-10-28 09:47:41,828 - root - INFO - 
2024-10-28 09:47:41,839 - root - INFO - Starting server

2024-10-28 09:47:41,839 - root - INFO - To see the GUI go to: http://0.0.0.0:8188
2024-10-28 09:47:41,839 - root - INFO - To see the GUI go to: http://[::]:8188
2024-10-28 09:48:12,229 - root - INFO - got prompt
2024-10-28 09:48:56,761 - root - INFO - video_flow shape: torch.Size([1, 16, 13, 60, 90])
2024-10-28 09:48:57,189 - ComfyUI-CogVideoXWrapper.nodes - INFO - Encoded latents shape: torch.Size([1, 1, 16, 60, 90])
2024-10-28 09:48:58,950 - root - INFO - Requested to load SD3ClipModel_
2024-10-28 09:48:58,951 - root - INFO - Loading 1 new model
2024-10-28 09:48:58,977 - root - INFO - loaded completely 0.0 4541.693359375 True
2024-10-28 09:49:17,649 - root - ERROR - !!! Exception during processing !!! The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1
2024-10-28 09:49:17,654 - root - ERROR - Traceback (most recent call last):
  File "/workspace/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/workspace/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/workspace/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/workspace/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 1319, in process
    latents = pipeline["pipe"](
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 483, in __call__
    latents, timesteps, noise = self.prepare_latents(
  File "/workspace/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 237, in prepare_latents
    latents = self.scheduler.add_noise(latents, noise, latent_timestep)
  File "/opt/conda/lib/python3.10/site-packages/diffusers/schedulers/scheduling_dpm_cogvideox.py", line 465, in add_noise
    noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
RuntimeError: The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1

2024-10-28 09:49:17,654 - root - INFO - Prompt executed in 64.97 seconds

Attached Workflow

Please make sure that workflow does not contain any sensitive information such as API keys or passwords.

{"last_node_id":78,"last_link_id":181,"nodes":[{"id":65,"type":"CreateShapeImageOnPath","pos":{"0":1052,"1":935},"size":{"0":313.4619445800781,"1":286},"flags":{},"order":10,"mode":0,"inputs":[{"name":"coordinates","type":"STRING","link":145,"widget":{"name":"coordinates"}},{"name":"size_multiplier","type":"FLOAT","link":null,"widget":{"name":"size_multiplier"},"shape":7},{"name":"frame_width","type":"INT","link":149,"widget":{"name":"frame_width"}},{"name":"frame_height","type":"INT","link":150,"widget":{"name":"frame_height"}}],"outputs":[{"name":"image","type":"IMAGE","links":[142,153],"slot_index":0},{"name":"mask","type":"MASK","links":[154],"slot_index":1}],"properties":{"Node name for S&R":"CreateShapeImageOnPath"},"widgets_values":["circle","",512,512,12,12,"red","black",0,1,[1],1]},{"id":56,"type":"CogVideoDecode","pos":{"0":1596,"1":150},"size":{"0":300.396484375,"1":198},"flags":{},"order":17,"mode":0,"inputs":[{"name":"pipeline","type":"COGVIDEOPIPE","link":128},{"name":"samples","type":"LATENT","link":127}],"outputs":[{"name":"images","type":"IMAGE","links":[155],"slot_index":0,"shape":3}],"properties":{"Node name for S&R":"CogVideoDecode"},"widgets_values":[false,240,360,0.2,0.2,true]},{"id":68,"type":"ImageCompositeMasked","pos":{"0":1674,"1":641},"size":{"0":315,"1":146},"flags":{},"order":18,"mode":0,"inputs":[{"name":"destination","type":"IMAGE","link":155},{"name":"source","type":"IMAGE","link":153},{"name":"mask","type":"MASK","link":154,"shape":7}],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[156],"slot_index":0}],"properties":{"Node name for S&R":"ImageCompositeMasked"},"widgets_values":[0,0,false]},{"id":75,"type":"DownloadAndLoadToraModel","pos":{"0":253,"1":146},"size":{"0":315,"1":58},"flags":{},"order":0,"mode":0,"inputs":[],"outputs":[{"name":"tora_model","type":"TORAMODEL","links":[175]}],"properties":{"Node name for S&R":"DownloadAndLoadToraModel"},"widgets_values":["kijai/CogVideoX-5b-Tora"]},{"id":44,"type":"VHS_VideoCombine","pos":{"0":2210,"1":151},"size":[1131.619140625,1065.0794270833335],"flags":{},"order":19,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":156},{"name":"audio","type":"AUDIO","link":null,"shape":7},{"name":"meta_batch","type":"VHS_BatchManager","link":null,"shape":7},{"name":"vae","type":"VAE","link":null,"shape":7}],"outputs":[{"name":"Filenames","type":"VHS_FILENAMES","links":null,"shape":3}],"properties":{"Node name for S&R":"VHS_VideoCombine"},"widgets_values":{"frame_rate":16,"loop_count":0,"filename_prefix":"1142","format":"video/h264-mp4","pix_fmt":"yuv420p","crf":19,"save_metadata":true,"pingpong":false,"save_output":true,"videopreview":{"hidden":false,"paused":false,"params":{"filename":"1142_00015.mp4","subfolder":"","type":"output","format":"video/h264-mp4","frame_rate":16},"muted":false}}},{"id":76,"type":"Note","pos":{"0":1,"1":-68},"size":{"0":338.9610290527344,"1":58},"flags":{},"order":1,"mode":0,"inputs":[],"outputs":[],"title":"Original Repo","properties":{},"widgets_values":["https://github.com/kijai/ComfyUI-CogVideoXWrapper"],"color":"#223","bgcolor":"#335"},{"id":66,"type":"VHS_VideoCombine","pos":{"0":1442,"1":1052},"size":[605.3909912109375,714.2606608072917],"flags":{},"order":13,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":142},{"name":"audio","type":"AUDIO","link":null,"shape":7},{"name":"meta_batch","type":"VHS_BatchManager","link":null,"shape":7},{"name":"vae","type":"VAE","link":null,"shape":7}],"outputs":[{"name":"Filenames","type":"VHS_FILENAMES","links":null,"shape":3}],"properties":{"Node name for S&R":"VHS_VideoCombine"},"widgets_values":{"frame_rate":8,"loop_count":0,"filename_prefix":"1142","format":"video/h264-mp4","pix_fmt":"yuv420p","crf":19,"save_metadata":true,"pingpong":false,"save_output":true,"videopreview":{"hidden":false,"paused":false,"params":{"filename":"1142_00021.mp4","subfolder":"","type":"output","format":"video/h264-mp4","frame_rate":8},"muted":false}}},{"id":20,"type":"CLIPLoader","pos":{"0":-26,"1":400},"size":{"0":451.30548095703125,"1":82},"flags":{},"order":2,"mode":0,"inputs":[],"outputs":[{"name":"CLIP","type":"CLIP","links":[54,56],"slot_index":0,"shape":3}],"properties":{"Node name for S&R":"CLIPLoader"},"widgets_values":["t5/google_t5-v1_1-xxl_encoderonly-fp8_e4m3fn.safetensors","sd3"]},{"id":31,"type":"CogVideoTextEncode","pos":{"0":497,"1":520},"size":{"0":463.01251220703125,"1":124},"flags":{},"order":7,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":56}],"outputs":[{"name":"conditioning","type":"CONDITIONING","links":[123],"slot_index":0,"shape":3}],"properties":{"Node name for S&R":"CogVideoTextEncode"},"widgets_values":["The video is not of a high quality, it has a low resolution. Watermark present in each frame. Strange motion trajectory. ",1,true]},{"id":74,"type":"ToraEncodeTrajectory","pos":{"0":1047,"1":678},"size":{"0":335.1993408203125,"1":206},"flags":{},"order":11,"mode":0,"inputs":[{"name":"pipeline","type":"COGVIDEOPIPE","link":174},{"name":"tora_model","type":"TORAMODEL","link":175},{"name":"coordinates","type":"STRING","link":176,"widget":{"name":"coordinates"}},{"name":"num_frames","type":"INT","link":170,"widget":{"name":"num_frames"}},{"name":"width","type":"INT","link":171,"widget":{"name":"width"}},{"name":"height","type":"INT","link":172,"widget":{"name":"height"}}],"outputs":[{"name":"tora_trajectory","type":"TORAFEATURES","links":[173]},{"name":"video_flow_images","type":"IMAGE","links":null}],"properties":{"Node name for S&R":"ToraEncodeTrajectory"},"widgets_values":["",720,480,49,0.8,0,0.5]},{"id":57,"type":"CogVideoSampler","pos":{"0":1138,"1":150},"size":{"0":399.8780822753906,"1":390},"flags":{},"order":16,"mode":0,"inputs":[{"name":"pipeline","type":"COGVIDEOPIPE","link":121},{"name":"positive","type":"CONDITIONING","link":122},{"name":"negative","type":"CONDITIONING","link":123},{"name":"samples","type":"LATENT","link":177,"shape":7},{"name":"image_cond_latents","type":"LATENT","link":null,"shape":7},{"name":"context_options","type":"COGCONTEXT","link":null,"shape":7},{"name":"controlnet","type":"COGVIDECONTROLNET","link":null,"shape":7},{"name":"tora_trajectory","type":"TORAFEATURES","link":173,"shape":7},{"name":"num_frames","type":"INT","link":157,"widget":{"name":"num_frames"}},{"name":"height","type":"INT","link":151,"widget":{"name":"height"}},{"name":"width","type":"INT","link":152,"widget":{"name":"width"}}],"outputs":[{"name":"cogvideo_pipe","type":"COGVIDEOPIPE","links":[128],"slot_index":0,"shape":3},{"name":"samples","type":"LATENT","links":[127],"shape":3}],"properties":{"Node name for S&R":"CogVideoSampler"},"widgets_values":[480,720,49,32,6,65334758276105,"fixed","CogVideoXDPMScheduler",1]},{"id":30,"type":"CogVideoTextEncode","pos":{"0":493,"1":303},"size":{"0":471.90142822265625,"1":168.08047485351562},"flags":{},"order":6,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":54}],"outputs":[{"name":"conditioning","type":"CONDITIONING","links":[122],"slot_index":0,"shape":3}],"properties":{"Node name for S&R":"CogVideoTextEncode"},"widgets_values":["face closeup a blonde beautiful 18 years old girl smiling, laughing, looking in the camera, sensual, background is nature",1,true]},{"id":67,"type":"GetMaskSizeAndCount","pos":{"0":763,"1":772},"size":{"0":264.5999755859375,"1":86},"flags":{"collapsed":true},"order":8,"mode":0,"inputs":[{"name":"mask","type":"MASK","link":146}],"outputs":[{"name":"mask","type":"MASK","links":null},{"name":"720 width","type":"INT","links":[149,152,168,171],"slot_index":1},{"name":"480 height","type":"INT","links":[150,151,169,172],"slot_index":2},{"name":"49 count","type":"INT","links":[157,170],"slot_index":3}],"properties":{"Node name for S&R":"GetMaskSizeAndCount"},"widgets_values":[]},{"id":71,"type":"CogVideoImageEncode","pos":{"0":225,"1":710},"size":{"0":315,"1":122},"flags":{},"order":15,"mode":0,"inputs":[{"name":"pipeline","type":"COGVIDEOPIPE","link":164},{"name":"image","type":"IMAGE","link":179},{"name":"mask","type":"MASK","link":null,"shape":7}],"outputs":[{"name":"samples","type":"LATENT","links":[177],"slot_index":0}],"properties":{"Node name for S&R":"CogVideoImageEncode"},"widgets_values":[16,false]},{"id":73,"type":"ImageResizeKJ","pos":{"0":-418,"1":487},"size":{"0":315,"1":266},"flags":{},"order":12,"mode":0,"inputs":[{"name":"image","type":"IMAGE","link":181},{"name":"get_image_size","type":"IMAGE","link":null,"shape":7},{"name":"width_input","type":"INT","link":168,"widget":{"name":"width_input"},"shape":7},{"name":"height_input","type":"INT","link":169,"widget":{"name":"height_input"},"shape":7}],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[178],"slot_index":0},{"name":"width","type":"INT","links":null},{"name":"height","type":"INT","links":null}],"properties":{"Node name for S&R":"ImageResizeKJ"},"widgets_values":[512,512,"nearest-exact",false,2,0,0,"disabled"]},{"id":77,"type":"GetImageSizeAndCount","pos":{"0":-67,"1":871},"size":{"0":277.20001220703125,"1":86},"flags":{},"order":14,"mode":0,"inputs":[{"name":"image","type":"IMAGE","link":178}],"outputs":[{"name":"image","type":"IMAGE","links":[179],"slot_index":0},{"name":"720 width","type":"INT","links":null},{"name":"480 height","type":"INT","links":null},{"name":"1 count","type":"INT","links":null}],"properties":{"Node name for S&R":"GetImageSizeAndCount"},"widgets_values":[]},{"id":1,"type":"DownloadAndLoadCogVideoModel","pos":{"0":634,"1":5},"size":{"0":337.8885192871094,"1":194},"flags":{},"order":3,"mode":0,"inputs":[{"name":"pab_config","type":"PAB_CONFIG","link":null,"shape":7},{"name":"block_edit","type":"TRANSFORMERBLOCKS","link":null,"shape":7},{"name":"lora","type":"COGLORA","link":null,"shape":7}],"outputs":[{"name":"cogvideo_pipe","type":"COGVIDEOPIPE","links":[121,164,174],"slot_index":0,"shape":3}],"properties":{"Node name for S&R":"DownloadAndLoadCogVideoModel"},"widgets_values":["THUDM/CogVideoX-5b-I2V","bf16","disabled","disabled",false]},{"id":60,"type":"SplineEditor","pos":{"0":-8,"1":1066},"size":{"0":765,"1":910},"flags":{},"order":4,"mode":0,"inputs":[{"name":"bg_image","type":"IMAGE","link":null,"shape":7}],"outputs":[{"name":"mask","type":"MASK","links":[146],"slot_index":0},{"name":"coord_str","type":"STRING","links":[145,176],"slot_index":1},{"name":"float","type":"FLOAT","links":null},{"name":"count","type":"INT","links":null},{"name":"normalized_str","type":"STRING","links":null}],"properties":{"Node name for S&R":"SplineEditor","points":"SplineEditor"},"widgets_values":["[{\"x\":253.19308790383164,\"y\":125.46957175056345},{\"x\":60.85649887302778,\"y\":92.41172051089404},{\"x\":57.09992486851989,\"y\":397.44552967693454},{\"x\":368.181818181818,\"y\":357.2727272727271},{\"x\":353.63636363636346,\"y\":185.45454545454535}]","[{\"x\":253.19308471679688,\"y\":125.46957397460938},{\"x\":237.5996551513672,\"y\":122.7894515991211},{\"x\":222.00624084472656,\"y\":120.10932922363281},{\"x\":206.39208984375,\"y\":117.55521392822266},{\"x\":190.6898651123047,\"y\":115.62690734863281},{\"x\":174.89862060546875,\"y\":114.69926452636719},{\"x\":159.0881805419922,\"y\":115.11483001708984},{\"x\":143.4235382080078,\"y\":117.26683044433594},{\"x\":128.22389221191406,\"y\":121.60110473632812},{\"x\":114.02021789550781,\"y\":128.51690673828125},{\"x\":101.50662231445312,\"y\":138.15098571777344},{\"x\":91.28062438964844,\"y\":150.1878204345703},{\"x\":83.45392608642578,\"y\":163.9186248779297},{\"x\":77.62779998779297,\"y\":178.6182403564453},{\"x\":73.4644775390625,\"y\":193.8758087158203},{\"x\":70.75473022460938,\"y\":209.45883178710938},{\"x\":69.38188171386719,\"y\":225.21670532226562},{\"x\":69.29756927490234,\"y\":241.03419494628906},{\"x\":70.50818634033203,\"y\":256.8052978515625},{\"x\":73.06712341308594,\"y\":272.413818359375},{\"x\":77.07130432128906,\"y\":287.71417236328125},{\"x\":82.65557861328125,\"y\":302.5091552734375},{\"x\":89.97900390625,\"y\":316.52178955078125},{\"x\":99.18762969970703,\"y\":329.36944580078125},{\"x\":110.34053039550781,\"y\":340.5647277832031},{\"x\":123.25296020507812,\"y\":349.676513671875},{\"x\":137.4200897216797,\"y\":356.69110107421875},{\"x\":152.3861083984375,\"y\":361.7956237792969},{\"x\":167.82684326171875,\"y\":365.216796875},{\"x\":183.5236053466797,\"y\":367.16241455078125},{\"x\":199.32876586914062,\"y\":367.7986755371094},{\"x\":215.13784790039062,\"y\":367.24932861328125},{\"x\":230.87010192871094,\"y\":365.5951843261719},{\"x\":246.4546661376953,\"y\":362.8813171386719},{\"x\":261.8180236816406,\"y\":359.1127014160156},{\"x\":276.8713684082031,\"y\":354.2528381347656},{\"x\":291.49053955078125,\"y\":348.21417236328125},{\"x\":305.47979736328125,\"y\":340.8374938964844},{\"x\":318.4987487792969,\"y\":331.8673400878906},{\"x\":330.064453125,\"y\":321.0940856933594},{\"x\":339.751953125,\"y\":308.6070861816406},{\"x\":347.27691650390625,\"y\":294.7084655761719},{\"x\":352.58074951171875,\"y\":279.8168029785156},{\"x\":355.8205261230469,\"y\":264.340576171875},{\"x\":357.2802429199219,\"y\":248.5929412841797},{\"x\":357.30206298828125,\"y\":232.7751007080078},{\"x\":356.30377197265625,\"y\":216.98614501953125},{\"x\":354.97100830078125,\"y\":201.22023010253906},{\"x\":353.6363525390625,\"y\":185.4545440673828}]",720,480,49,"path","basis",0.5,1,"list",0,1,null,null,null]},{"id":78,"type":"AlphaChanelRemove","pos":{"0":-666,"1":355},"size":{"0":214.20001220703125,"1":26},"flags":{},"order":9,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":180}],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[181],"slot_index":0}],"properties":{"Node name for S&R":"AlphaChanelRemove"}},{"id":72,"type":"LoadImage","pos":{"0":-1001,"1":452},"size":{"0":315,"1":314},"flags":{},"order":5,"mode":0,"inputs":[],"outputs":[{"name":"IMAGE","type":"IMAGE","links":[180],"slot_index":0},{"name":"MASK","type":"MASK","links":null}],"properties":{"Node name for S&R":"LoadImage"},"widgets_values":["bailing_00053_.png","image"]}],"links":[[54,20,0,30,0,"CLIP"],[56,20,0,31,0,"CLIP"],[121,1,0,57,0,"COGVIDEOPIPE"],[122,30,0,57,1,"CONDITIONING"],[123,31,0,57,2,"CONDITIONING"],[127,57,1,56,1,"LATENT"],[128,57,0,56,0,"COGVIDEOPIPE"],[142,65,0,66,0,"IMAGE"],[145,60,1,65,0,"STRING"],[146,60,0,67,0,"MASK"],[149,67,1,65,2,"INT"],[150,67,2,65,3,"INT"],[151,67,2,57,9,"INT"],[152,67,1,57,10,"INT"],[153,65,0,68,1,"IMAGE"],[154,65,1,68,2,"MASK"],[155,56,0,68,0,"IMAGE"],[156,68,0,44,0,"IMAGE"],[157,67,3,57,8,"INT"],[164,1,0,71,0,"COGVIDEOPIPE"],[168,67,1,73,2,"INT"],[169,67,2,73,3,"INT"],[170,67,3,74,3,"INT"],[171,67,1,74,4,"INT"],[172,67,2,74,5,"INT"],[173,74,0,57,7,"TORAFEATURES"],[174,1,0,74,0,"COGVIDEOPIPE"],[175,75,0,74,1,"TORAMODEL"],[176,60,1,74,2,"STRING"],[177,71,0,57,3,"LATENT"],[178,73,0,77,0,"IMAGE"],[179,77,0,71,1,"IMAGE"],[180,72,0,78,0,"IMAGE"],[181,78,0,73,0,"IMAGE"]],"groups":[],"config":{},"extra":{"ds":{"scale":0.7513148009015777,"offset":[1006.8047535543986,133.66150201085824]}},"version":0.4}

Additional Context

I used this types of vertical images here,

bailing_00053_

Vertical CogX_Tora.json

Any Vertical images or images other than 720*480 are not working I think

Also It bugs the sampler.... after error, the images which worked also gave the same error.

Oct 28 '24 14:10 jerrydavos

The current I2V models does not support anything but 49 x 720x480, only the "Fun" version does, which then does not support Tora, I know it's messy right now with all the different models and their variations coming out every other week.

Oct 28 '24 14:10 kijai

I'm using the current I2V and 49 x 720 x 480 and I still get this error 🤔

got prompt
Merging rank 256 LoRA weights from Q:\ComfyUI\models\CogVideo\loras\orbit_left_lora_weights.safetensors with strength 1.0
Encoded latents shape: torch.Size([1, 1, 16, 60, 90])
!!! Exception during processing !!! The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1
Traceback (most recent call last):
  File "Q:\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "Q:\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "Q:\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "Q:\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "Q:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\nodes.py", line 866, in process
    latents = pipeline["pipe"](
  File "Q:\ComfyUI\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "Q:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\pipeline_cogvideox.py", line 485, in __call__
    latents, timesteps, noise = self.prepare_latents(
  File "Q:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\pipeline_cogvideox.py", line 239, in prepare_latents
    latents = self.scheduler.add_noise(latents, noise, latent_timestep)
  File "Q:\ComfyUI\venv\lib\site-packages\diffusers\schedulers\scheduling_dpm_cogvideox.py", line 465, in add_noise
    noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
RuntimeError: The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1

Nov 11 '24 15:11 Kallamamran

I'm using the current I2V and 49 x 720 x 480 and I still get this error 🤔

got prompt
Merging rank 256 LoRA weights from Q:\ComfyUI\models\CogVideo\loras\orbit_left_lora_weights.safetensors with strength 1.0
Encoded latents shape: torch.Size([1, 1, 16, 60, 90])
!!! Exception during processing !!! The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1
Traceback (most recent call last):
  File "Q:\ComfyUI\execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "Q:\ComfyUI\execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "Q:\ComfyUI\execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "Q:\ComfyUI\execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "Q:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\nodes.py", line 866, in process
    latents = pipeline["pipe"](
  File "Q:\ComfyUI\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "Q:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\pipeline_cogvideox.py", line 485, in __call__
    latents, timesteps, noise = self.prepare_latents(
  File "Q:\ComfyUI\custom_nodes\ComfyUI-CogVideoXWrapper\pipeline_cogvideox.py", line 239, in prepare_latents
    latents = self.scheduler.add_noise(latents, noise, latent_timestep)
  File "Q:\ComfyUI\venv\lib\site-packages\diffusers\schedulers\scheduling_dpm_cogvideox.py", line 465, in add_noise
    noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
RuntimeError: The size of tensor a (14) must match the size of tensor b (13) at non-singleton dimension 1

Condition image, the one you use for I2V models, goes in the other input, not samples. Samples is for doing vid2vid.

Nov 11 '24 15:11 kijai