Wan2.2 animate inference errrors
Hi, when i using diffsynth to do inference of wan2.2 animate, if the input video shape is not 1280(h)*720(w) will get the error as blow:
File "./DiffSynth-Studio/diffsynth/models/wan_video_animate_adapter.py", line 643, in after_patch_embedding
[rank3]: x[:, :, 1:] += pose_latents
[rank3]: RuntimeError: The size of tensor a (37) must match the size of tensor b (36) at non-singleton dimension 2
same error happened when i change the frames of input pose vedio(error when frames==147)
input_image = Image.open("./src_pose_1280_720.mp4").raw_data()[:147-4]
animate_face_video = VideoData("./src_face.mp4").raw_data()[:147-4]
But it seems work with https://github.com/Wan-Video/Wan2.2/blob/main/generate.py
Is there any solution? THX!
@Rex-dby The number of frames can only be 4n+1, but 147 is 4n+3. This is the limitation of the base model.
@Rex-dby The number of frames can only be 4n+1, but 147 is 4n+3. This is the limitation of the base model.
Got it, THX! What about the shape of input vedio & image? Seems like the original inference code https://github.com/Wan-Video/Wan2.2/blob/main/generate.py support 960*960 input shape