AnimateDiff Training issues and learning rates

Hi! Thanks for releasing the models + the training code. That's a massive contribution !

I've tried to train the model either by finetuning the already released model or training from scratch but the result is always the same : the model starts collapsing and the frames produced during training are only noise.

Here are what I tested to prevent that :

Using 30 videos or 3500 videos
Using different batch size (I started at BS1 because I don't have enough VRAM to go higher with 24GB) : --- Using gradient accumulation steps of 4 with BS1 : No really change --- Using BS4 + gradient accumulation steps of 1 with Gradient checkpointing : Strangely the model didn't seem to learn ANYTHING when using gradient checkpointing
The only thing that got any result was to really reduce the learning rate : -- LR 1e-4 : Model collapse after only 40 steps -- LR 1e-5 : Model collapse after around 100 steps -- LR 1e-7 : Model collapse after 10K steps but it didn't learn anything

I haven't tried using the original dataset of videos, that would be my next test. Can it be because of the videos I used ? Something with FPS or anything ?

Has anyone else managed to train from scratch or finetune ? If yes, what LR did you use ? And what other params have you changed from the training.yaml file ?

Thanks

Aug 29 '23 12:08 scarbain

same issue, have you handled this problem?

Sep 01 '23 07:09 shliu0

No, I haven't tried again

Sep 01 '23 08:09 scarbain

i have updated xformers from 0.0.16 to 0.0.17, then it works, maybe you can try this

Sep 06 '23 02:09 shliu0

Hi! Thanks for releasing the models + the training code. That's a massive contribution !

I've tried to train the model either by finetuning the already released model or training from scratch but the result is always the same : the model starts collapsing and the frames produced during training are only noise.

Here are what I tested to prevent that :

Using 30 videos or 3500 videos

Using different batch size (I started at BS1 because I don't have enough VRAM to go higher with 24GB) : --- Using gradient accumulation steps of 4 with BS1 : No really change --- Using BS4 + gradient accumulation steps of 1 with Gradient checkpointing : Strangely the model didn't seem to learn ANYTHING when using gradient checkpointing

The only thing that got any result was to really reduce the learning rate : -- LR 1e-4 : Model collapse after only 40 steps -- LR 1e-5 : Model collapse after around 100 steps -- LR 1e-7 : Model collapse after 10K steps but it didn't learn anything

I haven't tried using the original dataset of videos, that would be my next test. Can it be because of the videos I used ? Something with FPS or anything ?

Has anyone else managed to train from scratch or finetune ? If yes, what LR did you use ? And what other params have you changed from the training.yaml file ?

Thanks

hi , whats your videos look like? same motion?

Sep 15 '23 07:09 yishuaidu

Any update？ I got the same issue. I'm not sure if the collapse is related to the training datasets. I used the tiktok videos to train the motion module from scratch without modifying any hyperparams in the training config file, but got noisy video after about 30 training steps. My xformer's version is 0.0.20.

Sep 19 '23 04:09 yifanliuu

Is it related to this issue?

Feb 19 '24 02:02 liutaocode