Proper Wan2.2 Animate WF ??
Am I the only one who noticed, lol, that the sample Workflow for Wan2.2 Animate does something completely different than what’s shown in the DEMO VIDEO from Wan2.2Animate??
In the DEMO VIDEO, the motion from the provided video clip is copied and applied onto the supplied image.png, whereas this workflow copies the content of the image (e.g., a character, a face) and pastes it onto the provided video.
Completely different things.
What video are you referring to? You can use the model in many different ways, if you don't want to do character replacement, you simply disconnect the background image and mask connections. The workflow example is just set to use every input available to show how they are constructed, but you don't have to use them.
What video are you referring to? You can use the model in many different ways, if you don't want to do character replacement, you simply disconnect the background image and mask connections. The workflow example is just set to use every input available to show how they are constructed, but you don't have to use them.
https://humanaigc.github.io/wan-animate/content/aa/%E8%A7%92%E8%89%B2%E5%90%88%E9%9B%86_%E5%B8%A6%E5%8E%9F%E5%9E%8B.mp4
this video.
I just wish to copy movement from a video to a poto
What video are you referring to? You can use the model in many different ways, if you don't want to do character replacement, you simply disconnect the background image and mask connections. The workflow example is just set to use every input available to show how they are constructed, but you don't have to use them.
https://humanaigc.github.io/wan-animate/content/aa/%E8%A7%92%E8%89%B2%E5%90%88%E9%9B%86_%E5%B8%A6%E5%8E%9F%E5%9E%8B.mp4
this video.
I just wish to copy movement from a video to a poto
Then you do like I said, disconnect the bg_image and mask inputs to the WanAnimate node.
Is there a way to improve quality of the output from WanAnimate? I tried using 720p on 5090 and the quality wasn't even close to the original WAN examples on their demo. The face is very blurry and the whole video has halos around edges. When I try increasing resolution to 832p it crashes with OOM.
had to give the without mask a try ;-) seems to work well
https://github.com/user-attachments/assets/d7eb758b-2f63-4e79-995d-27a60ff8c7be
https://github.com/user-attachments/assets/773721df-403c-4628-b5b5-ddb603c59571
meh, it still runs into the same skeleton issues. You can’t accurately transfer an actor’s performance onto a cartoon character with very different proportions (like a chibi, for example). The system just ends up trying to stretch the character to match the skeleton. At least, that’s been my experience and it’s also what I see happening in the post above.
You can turn off the "skeleton" though, either for body, or for face, or for hands, or for all, but then the motions/posture will be less an exact clone of the driving video.
And/or you can take first frame of the driving video and make a cartoon based character on that, so that there is no need for the model to stretch anything to fit the skeleton
Any ideas why my videos start with a bad/darker image if I disconnect the bg_image and mask inputs to the WanAnimate node?
You can turn off the "skeleton" though, either for body, or for face, or for hands, or for all, but then the motions/posture will be less an exact clone of the driving video.
And/or you can take first frame of the driving video and make a cartoon based character on that, so that there is no need for the model to stretch anything to fit the skeleton
Thanks, I tried all of that. The results are mixed, as expected.. not really ideal in most cases, though there are a few tricks that can sometimes work.
The second option obviously isn’t viable when you’re dealing with already-established characters.
There is another paper out there that addresses this problem (can’t recall the name right now), where the system correctly matches a character’s pose regardless of limb or head size. It’s a shame this one doesn’t quite get it right, but still, it’s a nice addition to the toolbox.
There is another paper out there that addresses this problem (can’t recall the name right now), where the system correctly matches a character’s pose regardless of limb or head size. It’s a shame this one doesn’t quite get it right, but still, it’s a nice addition to the toolbox.
yeah that would be nice, although i guess if the reference photo is a full body shot (as typical references photos), maybe the "stretching" to fit skeleton works better (or worse). I havent tried ;-)
Yeah, we will have to wait... but this is fun too when working as expected, and can be useful in many situations. :)
https://github.com/user-attachments/assets/076704fc-9b63-4cf6-a4c2-fe9045a735b0
that one looks nice for sure ;-) but yeah it can be a bit of a hit and miss..
Yeah, we will have to wait... but this is fun too when working as expected, and can be useful in many situations. :) rk_scared.mp4
Looks nice. You also have the issue with an initial dark frame. Any ideas how to solve that? In the video of @RuneGjerde I don't see that issue. Is the video cut or did you somehow solve it?
Edit: Looks like only disconnecting the mask input and keeping the bg_input solved it.
Yes i didnt have the mask input, but just because i didnt need it (didnt know it had any impact on first frame)
One thing that is perhaps a bit odd, is that i forgot to change my prompt in all the videos, and its been "werewolf in a crowded office etc etc" from my first Wolf of Wall Street attempt. I guess the prompting doesnt have a strong influence then (or maybe thats why both my fluffy toy and lion in the above videos have a bit of claws and canine teeth .. haha .. probably)
my issue with swapping character its not super close to the reference character i want any fix for that to make it look closer?
my issue is with pupil movement not matching. the blinking matches. thought it might be due to me using context_options, but this is all within a single context window.
my issue is with pupil movement not matching. the blinking matches. thought it might be due to me using context_options, but this is all within a single context window.
Could try turn on face detection in the DWpose, see if it helps or not
anyway i can make the character swap be closer to my actual reference image?
anyway i can make the character swap be closer to my actual reference image?
its a new model so i havent gotten super familiar with it, but generally speaking the more the reference image resemble first frame in composition and size of the person, the better result (but since this model is a bit VACE ish, i dont really know, could even be better with a character full body pose, but i havent tried)
my issue is with pupil movement not matching. the blinking matches. thought it might be due to me using context_options, but this is all within a single context window.
Could try turn on face detection in the DWpose, see if it helps or not
experimenting is fun and educational at least. saw this post https://x.com/AIWarper/status/1969136243666563429
i will just keep pupils looking at the camera for now!
experimenting is fun and educational at least. saw this post https://x.com/AIWarper/status/1969136243666563429
i will just keep pupils looking at the camera for now!
Might be related to this https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1251#issuecomment-3314890336 Perhaps a new pose detector is coming ;-)
experimenting is fun and educational at least. saw this post https://x.com/AIWarper/status/1969136243666563429 i will just keep pupils looking at the camera for now!
Might be related to this #1251 (comment) Perhaps a new pose detector is coming ;-)
Main thing is that the face crop needs to have the eyes and eyebrows visible, if the eyes are right on the frame edge it doesn't track them at all.
The whole preprocessing pipeline from the original code is not implemented yet, it is pretty complex and even involves using Flux Kontext to repose your reference, and there indeed is pose retargeting code etc.
The preprocessing in the example workflow is just pieced together from existing nodes, it's not very robust.
It seems pretty good already though. Usable. The eyes move in a natural way.
But perhaps the most extreme eye tracking is perhaps not there, for an exact clone. Or if the video person looks away etc.
https://github.com/user-attachments/assets/12947bd8-33af-490c-99a5-8d32c1c5f5a0
yeah, it's natural. in my case, the actor was looking slightly off camera. nothing extreme. the replaced actor did a quick glance at the camera that wasn't in the original. anyway, i will keep playing... i just posted here just in case there was good teacher here that could control their pupils...
Getting anecdotally better generations after adding Wan2.2-Fun-A14B-InP-LOW-HPS2.1_resized_dynamic_avg_rank_15_bf16.safetensors to the LoRA stack. Seems to work with other LoRAs as well.
Can confirm that removing both bg_images and mask inputs for WanVideo Animate Embeds node will then use the background from the still image frame instead of the video input. Flexible if you want to inject some other video for the background instead, etc.
Its still new so lots of experimenting to do hah. Thanks!
I guess bad first latent issue got fixed with b9fd93c19e2d3de8d59fa9028d3f0d0567038634 I gotta test that in my workflow. @kijai I just found the same visual issue with S2V, can you possibly check and apply the same fix there?
@kijai Thank you! Does the animate workflow example (-01) work for character replacement?
@kijai Thank you! Does the animate workflow example (-01) work for character replacement?
thats the default mode for the workflow, to replace characters.
(while disconnecting mask and / or background image gives other results)
@kijai Thank you! Does the animate workflow example (-01) work for character replacement?
thats the default mode for the workflow, to replace characters.
(while disconnecting mask and / or background image gives other results)
But the default mode generate the raw input video
But the default mode generate the raw input video
Perhaps I misunderstood then. Thought you wanted to replace someone in the raw video input (thats the default mode, with masking, reference image etc..)
A bit like the below (i turned off the DWPose for the below one, if you add some of that back, the hands, face, and body is copied over even more if desirable ) :
https://github.com/user-attachments/assets/3eb432d1-ccd0-49e5-b561-1d12506f62dc
If you disconnect the mask, and background image (aka raw video), you can get an animation solely based on your input image, with the movements only from the raw video. Can be anything: singing, dance, talk, gestures ... A bit like this for example:
https://github.com/user-attachments/assets/ffafb0fb-7364-48da-937a-cd9f9e92def1