AnimateDiff icon indicating copy to clipboard operation
AnimateDiff copied to clipboard

Notable gap between image stylization and video stylization based on animatediff and controlnet

Open mengfanShi opened this issue 1 year ago • 2 comments

Hi, thanks for your great work! I've been exploring video stylization with animatediff yet and I noticed you might have already tried these out. I've found that the results of generated video significantly differ from image-to-image stylization, the image stylization is clear and aligns well with the style model, but with adv3, there's a notable gap. Have you ever experienced this? Additionally, when using adv3, the output becomes less smooth with more than 16 frames, resulting in flickering. Could I try your trained 48-frame model?

Thanks a lot!

mengfanShi avatar Sep 05 '24 09:09 mengfanShi

I'm using adv3 too, it is much more expressive then mine. 48 frame model gives more stability and removes flickering. I trained my models on 4000+ samples for 180 000 steps and my model lost possibility to imagine something.

I found out that using lora is brings quite good result. I'm training lora on all attention layers (unet and mm) on one video for 3 frames on resolution 1024x576 with adding info about frame position into embeddings for 100 epochs (for example video contain 30 frames, so i train for 1000 steps). Then i'm using it with weight something near 0.3 and it gives good result on stylisation task and removes flickering as well.

tumurzakov avatar Sep 05 '24 18:09 tumurzakov

Thanks a lot for your suggestion ! I'll try to train a lora on all attentions layers.

mengfanShi avatar Sep 06 '24 02:09 mengfanShi