Run Wan2.1-VACE-14B failed when set vace_video_mask
Run Wan2.1-VACE-14B failed when have vace_video_mask. error log:
Downloading Model from https://www.modelscope.cn to directory: DiffSynth-Studio-main/models/Wan-AI/Wan2.1-T2V-1.3B
2025-06-27 09:18:01,016 - modelscope - INFO - Target directory already exists, skipping creation.
VAE encoding: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:08<00:00, 1.05it/s]
VAE encoding: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:08<00:00, 1.11it/s]
0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "DiffSynth-Studio-main/examples/wanvideo/model_inference/Wan2.1-VACE-14B.py", line 61, in <module>
video = pipe(
prompt=prompt,
...<6 lines>...
seed=1, tiled=True
)
File "conda/envs/wan2.1/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "DiffSynth-Studio-main/diffsynth/pipelines/wan_video_new.py", line 556, in __call__
noise_pred_posi = self.model_fn(**models, **inputs_shared, **inputs_posi, timestep=timestep)
File "DiffSynth-Studio-main/diffsynth/pipelines/wan_video_new.py", line 1128, in model_fn_wan_video
vace_hints = vace(x, vace_context, context, t_mod, freqs)
File "conda/envs/wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "conda/envs/wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "DiffSynth-Studio-main/diffsynth/models/wan_video_vace.py", line 61, in forward
torch.cat([u, u.new_zeros(1, x.shape[1] - u.size(1), u.size(2))],
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to create tensor with negative dimension -53040: [1, -53040, 5120]
Same error raised after I changed the key of self.inputs_shared in class WanVideoPipeline from vace_video_mask to vace_mask, cause I found that vace_mask instead vace_video_mask is in the input_params of WanVideoUnit_VACE.
Help, please.
I also Run Wan2.1-VACE-14B failed when set vace_video_mask
But it will be OK with the following height and width setting:
control_video = VideoData("output_0010_ff.mp4", height=480, width=832) vace_video_mask = VideoData("output_0010_mask_ff.mp4", height=480, width=832)
If we use 720P inference, we should set height and weight as follows:
control_video = VideoData("output_0010_ff.mp4", height=720, width=1280)
vace_video_mask = VideoData("output_0010_mask_ff.mp4", height=720, width=1280)
pipe( prompt="xxxxx", negative_prompt="xxxxxx", vace_video=control_video, vace_video_mask=vace_video_mask, seed=1, tiled=True, height=720, width=1280, )
Even I faced this issue. In that case I just wrote a preprocessing script that brings the video to VACE's resolution and also samples the appropriate number of frames. Here is the reference: https://github.com/cs-mshah/DiffSynth-Studio/blob/vlr-proj/examples/wanvideo/preprocess_data.py