ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

Proper Wan2.2 Animate WF ??

Open cyberbol opened this issue 3 months ago • 40 comments

Am I the only one who noticed, lol, that the sample Workflow for Wan2.2 Animate does something completely different than what’s shown in the DEMO VIDEO from Wan2.2Animate??

In the DEMO VIDEO, the motion from the provided video clip is copied and applied onto the supplied image.png, whereas this workflow copies the content of the image (e.g., a character, a face) and pastes it onto the provided video.

Completely different things.

cyberbol avatar Sep 19 '25 18:09 cyberbol

What video are you referring to? You can use the model in many different ways, if you don't want to do character replacement, you simply disconnect the background image and mask connections. The workflow example is just set to use every input available to show how they are constructed, but you don't have to use them.

kijai avatar Sep 19 '25 18:09 kijai

What video are you referring to? You can use the model in many different ways, if you don't want to do character replacement, you simply disconnect the background image and mask connections. The workflow example is just set to use every input available to show how they are constructed, but you don't have to use them.

https://humanaigc.github.io/wan-animate/content/aa/%E8%A7%92%E8%89%B2%E5%90%88%E9%9B%86_%E5%B8%A6%E5%8E%9F%E5%9E%8B.mp4

this video.

I just wish to copy movement from a video to a poto

cyberbol avatar Sep 19 '25 18:09 cyberbol

What video are you referring to? You can use the model in many different ways, if you don't want to do character replacement, you simply disconnect the background image and mask connections. The workflow example is just set to use every input available to show how they are constructed, but you don't have to use them.

https://humanaigc.github.io/wan-animate/content/aa/%E8%A7%92%E8%89%B2%E5%90%88%E9%9B%86_%E5%B8%A6%E5%8E%9F%E5%9E%8B.mp4

this video.

I just wish to copy movement from a video to a poto

Then you do like I said, disconnect the bg_image and mask inputs to the WanAnimate node.

kijai avatar Sep 19 '25 18:09 kijai

Is there a way to improve quality of the output from WanAnimate? I tried using 720p on 5090 and the quality wasn't even close to the original WAN examples on their demo. The face is very blurry and the whole video has halos around edges. When I try increasing resolution to 832p it crashes with OOM.

jnpatrick99 avatar Sep 19 '25 21:09 jnpatrick99

had to give the without mask a try ;-) seems to work well

https://github.com/user-attachments/assets/d7eb758b-2f63-4e79-995d-27a60ff8c7be

https://github.com/user-attachments/assets/773721df-403c-4628-b5b5-ddb603c59571

RuneGjerde avatar Sep 19 '25 22:09 RuneGjerde

meh, it still runs into the same skeleton issues. You can’t accurately transfer an actor’s performance onto a cartoon character with very different proportions (like a chibi, for example). The system just ends up trying to stretch the character to match the skeleton. At least, that’s been my experience and it’s also what I see happening in the post above.

snicolast avatar Sep 19 '25 23:09 snicolast

You can turn off the "skeleton" though, either for body, or for face, or for hands, or for all, but then the motions/posture will be less an exact clone of the driving video.

And/or you can take first frame of the driving video and make a cartoon based character on that, so that there is no need for the model to stretch anything to fit the skeleton

RuneGjerde avatar Sep 19 '25 23:09 RuneGjerde

Any ideas why my videos start with a bad/darker image if I disconnect the bg_image and mask inputs to the WanAnimate node?

djdookie avatar Sep 19 '25 23:09 djdookie

You can turn off the "skeleton" though, either for body, or for face, or for hands, or for all, but then the motions/posture will be less an exact clone of the driving video.

And/or you can take first frame of the driving video and make a cartoon based character on that, so that there is no need for the model to stretch anything to fit the skeleton

Thanks, I tried all of that. The results are mixed, as expected.. not really ideal in most cases, though there are a few tricks that can sometimes work.

The second option obviously isn’t viable when you’re dealing with already-established characters.

There is another paper out there that addresses this problem (can’t recall the name right now), where the system correctly matches a character’s pose regardless of limb or head size. It’s a shame this one doesn’t quite get it right, but still, it’s a nice addition to the toolbox.

snicolast avatar Sep 20 '25 00:09 snicolast

There is another paper out there that addresses this problem (can’t recall the name right now), where the system correctly matches a character’s pose regardless of limb or head size. It’s a shame this one doesn’t quite get it right, but still, it’s a nice addition to the toolbox.

yeah that would be nice, although i guess if the reference photo is a full body shot (as typical references photos), maybe the "stretching" to fit skeleton works better (or worse). I havent tried ;-)

RuneGjerde avatar Sep 20 '25 00:09 RuneGjerde

Yeah, we will have to wait... but this is fun too when working as expected, and can be useful in many situations. :)

https://github.com/user-attachments/assets/076704fc-9b63-4cf6-a4c2-fe9045a735b0

snicolast avatar Sep 20 '25 00:09 snicolast

that one looks nice for sure ;-) but yeah it can be a bit of a hit and miss..

RuneGjerde avatar Sep 20 '25 00:09 RuneGjerde

Yeah, we will have to wait... but this is fun too when working as expected, and can be useful in many situations. :) rk_scared.mp4

Looks nice. You also have the issue with an initial dark frame. Any ideas how to solve that? In the video of @RuneGjerde I don't see that issue. Is the video cut or did you somehow solve it?

Edit: Looks like only disconnecting the mask input and keeping the bg_input solved it.

djdookie avatar Sep 20 '25 00:09 djdookie

Yes i didnt have the mask input, but just because i didnt need it (didnt know it had any impact on first frame)

One thing that is perhaps a bit odd, is that i forgot to change my prompt in all the videos, and its been "werewolf in a crowded office etc etc" from my first Wolf of Wall Street attempt. I guess the prompting doesnt have a strong influence then (or maybe thats why both my fluffy toy and lion in the above videos have a bit of claws and canine teeth .. haha .. probably)

RuneGjerde avatar Sep 20 '25 00:09 RuneGjerde

my issue with swapping character its not super close to the reference character i want any fix for that to make it look closer?

Maki9009 avatar Sep 20 '25 06:09 Maki9009

my issue is with pupil movement not matching. the blinking matches. thought it might be due to me using context_options, but this is all within a single context window.

jason-mightynice avatar Sep 20 '25 10:09 jason-mightynice

my issue is with pupil movement not matching. the blinking matches. thought it might be due to me using context_options, but this is all within a single context window.

Could try turn on face detection in the DWpose, see if it helps or not

RuneGjerde avatar Sep 20 '25 10:09 RuneGjerde

anyway i can make the character swap be closer to my actual reference image?

Maki9009 avatar Sep 20 '25 10:09 Maki9009

anyway i can make the character swap be closer to my actual reference image?

its a new model so i havent gotten super familiar with it, but generally speaking the more the reference image resemble first frame in composition and size of the person, the better result (but since this model is a bit VACE ish, i dont really know, could even be better with a character full body pose, but i havent tried)

RuneGjerde avatar Sep 20 '25 11:09 RuneGjerde

my issue is with pupil movement not matching. the blinking matches. thought it might be due to me using context_options, but this is all within a single context window.

Could try turn on face detection in the DWpose, see if it helps or not

experimenting is fun and educational at least. saw this post https://x.com/AIWarper/status/1969136243666563429

i will just keep pupils looking at the camera for now!

jason-mightynice avatar Sep 20 '25 11:09 jason-mightynice

experimenting is fun and educational at least. saw this post https://x.com/AIWarper/status/1969136243666563429

i will just keep pupils looking at the camera for now!

Might be related to this https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1251#issuecomment-3314890336 Perhaps a new pose detector is coming ;-)

RuneGjerde avatar Sep 20 '25 11:09 RuneGjerde

experimenting is fun and educational at least. saw this post https://x.com/AIWarper/status/1969136243666563429 i will just keep pupils looking at the camera for now!

Might be related to this #1251 (comment) Perhaps a new pose detector is coming ;-)

Main thing is that the face crop needs to have the eyes and eyebrows visible, if the eyes are right on the frame edge it doesn't track them at all.

The whole preprocessing pipeline from the original code is not implemented yet, it is pretty complex and even involves using Flux Kontext to repose your reference, and there indeed is pose retargeting code etc.

The preprocessing in the example workflow is just pieced together from existing nodes, it's not very robust.

kijai avatar Sep 20 '25 11:09 kijai

It seems pretty good already though. Usable. The eyes move in a natural way.
But perhaps the most extreme eye tracking is perhaps not there, for an exact clone. Or if the video person looks away etc.

https://github.com/user-attachments/assets/12947bd8-33af-490c-99a5-8d32c1c5f5a0

RuneGjerde avatar Sep 20 '25 12:09 RuneGjerde

yeah, it's natural. in my case, the actor was looking slightly off camera. nothing extreme. the replaced actor did a quick glance at the camera that wasn't in the original. anyway, i will keep playing... i just posted here just in case there was good teacher here that could control their pupils...

jason-mightynice avatar Sep 20 '25 12:09 jason-mightynice

Getting anecdotally better generations after adding Wan2.2-Fun-A14B-InP-LOW-HPS2.1_resized_dynamic_avg_rank_15_bf16.safetensors to the LoRA stack. Seems to work with other LoRAs as well.

Can confirm that removing both bg_images and mask inputs for WanVideo Animate Embeds node will then use the background from the still image frame instead of the video input. Flexible if you want to inject some other video for the background instead, etc.

Its still new so lots of experimenting to do hah. Thanks!

ubergarm avatar Sep 20 '25 16:09 ubergarm

I guess bad first latent issue got fixed with b9fd93c19e2d3de8d59fa9028d3f0d0567038634 I gotta test that in my workflow. @kijai I just found the same visual issue with S2V, can you possibly check and apply the same fix there?

djdookie avatar Sep 20 '25 23:09 djdookie

@kijai Thank you! Does the animate workflow example (-01) work for character replacement?

ttio2tech avatar Sep 20 '25 23:09 ttio2tech

@kijai Thank you! Does the animate workflow example (-01) work for character replacement?

thats the default mode for the workflow, to replace characters.

(while disconnecting mask and / or background image gives other results)

RuneGjerde avatar Sep 21 '25 00:09 RuneGjerde

@kijai Thank you! Does the animate workflow example (-01) work for character replacement?

thats the default mode for the workflow, to replace characters.

(while disconnecting mask and / or background image gives other results)

But the default mode generate the raw input video

ttio2tech avatar Sep 21 '25 00:09 ttio2tech

But the default mode generate the raw input video

Perhaps I misunderstood then. Thought you wanted to replace someone in the raw video input (thats the default mode, with masking, reference image etc..)

A bit like the below (i turned off the DWPose for the below one, if you add some of that back, the hands, face, and body is copied over even more if desirable ) :

https://github.com/user-attachments/assets/3eb432d1-ccd0-49e5-b561-1d12506f62dc

If you disconnect the mask, and background image (aka raw video), you can get an animation solely based on your input image, with the movements only from the raw video. Can be anything: singing, dance, talk, gestures ... A bit like this for example:

https://github.com/user-attachments/assets/ffafb0fb-7364-48da-937a-cd9f9e92def1

RuneGjerde avatar Sep 21 '25 01:09 RuneGjerde