DiffSynth-Studio Run Wan2.1-VACE-14B failed when set vace_video

Run Wan2.1-VACE-14B failed when have vace_video_mask. error log:

Downloading Model from https://www.modelscope.cn to directory: DiffSynth-Studio-main/models/Wan-AI/Wan2.1-T2V-1.3B
2025-06-27 09:18:01,016 - modelscope - INFO - Target directory already exists, skipping creation.
VAE encoding: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:08<00:00,  1.05it/s]
VAE encoding: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:08<00:00,  1.11it/s]
  0%|                                                                                                                                                                                 | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "DiffSynth-Studio-main/examples/wanvideo/model_inference/Wan2.1-VACE-14B.py", line 61, in <module>
    video = pipe(
        prompt=prompt,
    ...<6 lines>...
        seed=1, tiled=True
    )
  File "conda/envs/wan2.1/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "DiffSynth-Studio-main/diffsynth/pipelines/wan_video_new.py", line 556, in __call__
    noise_pred_posi = self.model_fn(**models, **inputs_shared, **inputs_posi, timestep=timestep)
  File "DiffSynth-Studio-main/diffsynth/pipelines/wan_video_new.py", line 1128, in model_fn_wan_video
    vace_hints = vace(x, vace_context, context, t_mod, freqs)
  File "conda/envs/wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "conda/envs/wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "DiffSynth-Studio-main/diffsynth/models/wan_video_vace.py", line 61, in forward
    torch.cat([u, u.new_zeros(1, x.shape[1] - u.size(1), u.size(2))],
                  ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to create tensor with negative dimension -53040: [1, -53040, 5120]

Same error raised after I changed the key of self.inputs_shared in class WanVideoPipeline from vace_video_mask to vace_mask, cause I found that vace_mask instead vace_video_mask is in the input_params of WanVideoUnit_VACE.

Help, please.

Jun 27 '25 10:06 LiuXiaolong19920720

I also Run Wan2.1-VACE-14B failed when set vace_video_mask

But it will be OK with the following height and width setting:

control_video = VideoData("output_0010_ff.mp4", height=480, width=832) vace_video_mask = VideoData("output_0010_mask_ff.mp4", height=480, width=832)

Aug 14 '25 07:08 SheldongChen

If we use 720P inference, we should set height and weight as follows:

control_video = VideoData("output_0010_ff.mp4", height=720, width=1280)

vace_video_mask = VideoData("output_0010_mask_ff.mp4", height=720, width=1280)

pipe( prompt="xxxxx", negative_prompt="xxxxxx", vace_video=control_video, vace_video_mask=vace_video_mask, seed=1, tiled=True, height=720, width=1280, )

Aug 14 '25 09:08 SheldongChen

Even I faced this issue. In that case I just wrote a preprocessing script that brings the video to VACE's resolution and also samples the appropriate number of frames. Here is the reference: https://github.com/cs-mshah/DiffSynth-Studio/blob/vlr-proj/examples/wanvideo/preprocess_data.py

Nov 15 '25 21:11 cs-mshah