AnimateDiff
AnimateDiff copied to clipboard
The right way of using SD XL motion model to get good quality output
Hi, first I'm very grateful for this wonderful work, animatediff is really awesome 👍
I got stucked in the quality issue for several days, when I use the sdxl motion model. Although the motion is very nice, the video quality seems to be quite low, looks like pixelated or downscaled. Here is the comparation of sdxl image and animatediff frame:
| Original image by Animagine XL | Animatediff SD XL Frame |
|---|---|
These two images are using the same size configuration. I'm using the comfyUI workflow adopted here: https://civitai.com/articles/2950, with Animagine XL V3.1 model & vae(you can save the image below and import in comfyui):
I tried with different number of steps / with&height settings / sampler / guidance, but got no luck.
I know the sdxl motion model is still in beta, but I can't get the same good result as the example in Readme. Is there anything I'm doing wrong here 😢 Could anyone show the right way of using the sdxl model? Thank you in advance.
You're not doing anything wrong. The SDXL beta motion model is just pure garbage. We're all in the same boat with these kind of XL results. I tried experimenting with video upscaling, but even then the quality of the results were just not as good as what we get from the 1.5 v3 motion model. If i had any understanding of how, I would train my own.
I worked around this by making a hybrid xl/sd1.5 workflow that generates an image with XL and uses 1.5 ip adapter. The detail isn't the same as XL, but the quality of the animation itself is far better. I'm attaching a comparison of 2 animations using the same parameters with the source image used.
XL Result
Hybrid XL/1.5 Result
Source
@F0xbite Thank you for the information! I also tried the animatediff 1.5/2 motion models, which is way better. Your solution is very enlightening 👍 I'm not going to waste time on the sdxl model. BTW, is there any other motion model better work with SD XL?
@F0xbite Thank you for the information! I also tried the animatediff 1.5/2 motion models, which is way better. Your solution is very enlightening 👍 I'm not going to waste time on the sdxl model. BTW, is there any other motion model better work with SD XL?
Glad to help. The only other one that I know of is HotshotXL. Hotshot does have better visible quality, but it's limited to 8 rendered frames max and I don't think it's possible to loop context, both of which are huge caveats for me. Also the quality of the motion seems rather poor and distorted in my testing, but that's just my opinion.
There's also SVD, but it's strictly a image->video model with no prompting and basically no control over motion.
So unfortunately, I don't know of a better solution than the hybrid system I'm using now, until a better motion model is trained for XL or the Flux team releases some kind of text2video model. But I'm sure that's bound to change at some point.
thanks a lot for sharing this @F0xbite, would love to use your hybrid workflow above if you could share 🔥
I have built on the same workflow and have exactly the same issue with seemingly low res output (while it's 1024x1024). I'd like to try out that hybrid workflow. When I naively select the sd15 v2 AnimateDiff model ComfyUI's Animated Diff loader will complain: "Motion module 'mm_sd_v15_v2.ckpt' is intended for SD1.5 models, but the provided model is type SDXL." What's your approach for the hybrid SDXL/sd15 workflow?
foxbite_hybrid_animatediff.json @biswaroop1547 @felixniemeyer Hey guys, sorry I'm just now seeing this. Here is my workflow. I cleaned it up a bit. Basically, a basic SDXL txt2img workflow, with the output image being fed into the IP adapter for SD1.5. I use a Lora tag loader just as a preference for me for the positive SDXL prompt, you can change it to a standard text encoder if you prefer. Enjoy!
@F0xbite Thank you for sharing! 👍
I actually have the opposite problem, my 1.5 results are usually garbage esp if I try piling on loras.
To get good results in HotshotXL the secret is upping the context stride to something like 3 (you can try 4 but sometimes that burns it). I had been using FreeInit like I was doing in 1.5 but that didn't work, upping the stride was like magic. Also FreeU helps a bit.
I haven't tried this with the SDXL motion model but I'm going to check ow to see if it might do the same there.
Thank you all for sharing, unfortunately I join the group to say that I have not been able to get great results from SDXL, Is there any information out there on training motion modules?, It looks like we may need to create custom modules for different types of actions to get better results.