DiffSynth-Studio Wan2.2 animate inference errrors

Hi, when i using diffsynth to do inference of wan2.2 animate, if the input video shape is not 1280(h)*720(w) will get the error as blow:

File "./DiffSynth-Studio/diffsynth/models/wan_video_animate_adapter.py", line 643, in after_patch_embedding
[rank3]:     x[:, :, 1:] += pose_latents
[rank3]: RuntimeError: The size of tensor a (37) must match the size of tensor b (36) at non-singleton dimension 2

same error happened when i change the frames of input pose vedio(error when frames==147)

input_image = Image.open("./src_pose_1280_720.mp4").raw_data()[:147-4]
animate_face_video = VideoData("./src_face.mp4").raw_data()[:147-4]

But it seems work with https://github.com/Wan-Video/Wan2.2/blob/main/generate.py

Is there any solution? THX!

Nov 04 '25 10:11 rex-29

@Rex-dby The number of frames can only be 4n+1, but 147 is 4n+3. This is the limitation of the base model.

Nov 04 '25 10:11 Artiprocher

@Rex-dby The number of frames can only be 4n+1, but 147 is 4n+3. This is the limitation of the base model.

Got it, THX! What about the shape of input vedio & image? Seems like the original inference code https://github.com/Wan-Video/Wan2.2/blob/main/generate.py support 960*960 input shape

Nov 05 '25 02:11 rex-29