AnimateDiff About training config: unet_use_cross_frame

I'm going to train the motion module myself. After hard preparation and debugging work, my training can finally be executed successfully, but I can't confirm the results. The output motion module cannot work well, the result images are almost the same. It's hard to recognize moving things.

Besides, I found that when I turn on unet_use_cross_frame_attention, the module 'SparseCausalAttention2D' cannot be found. Could you please help me with trainning config and missing 'SparseCausalAttention2D' module?

Oct 30 '23 08:10 fangsiff

Hi, can you tell me your learning rate and train dataset? I also tried to train the motion module on a small dataset. I set batch size =4 and lr=3e-5, but the train result didn't look good and the train loss didn't decrease.

Nov 02 '23 11:11 tacit0428

@tacit0428 Hi, I used the dataset from https://github.com/m-bain/webvid. I just use 30 videos of results_2M_train.csv, which contanins the keywords: smile, camera, look, (man | woman | girl |boy) . I also confused with the configuration, the follow is my training config:

learning_rate: 8.0e-05
num_examples: 30
max_train_steps: 8000

I also cannot get good motion module yet. Besides, my trained model cannot be used for inference v2, therefore I think the training scripts util now are suitable for mm-14 rather than mm-15 and mm-15_v2. I hope it helps.

Nov 03 '23 07:11 fangsiff

It seems good. I can use inference v2 and mm-15_v2. But I can't turn on unet_use_cross_frame_attention either. I don't know if it's essential. Did you turn on this parameter in your training? Besides, I found that the training result is not good when I use stable diffusion without any lora.

Nov 03 '23 07:11 tacit0428

My result gif is lack of animate motion, the movments are barely to be noticed. I don't turn on the unet_use_cross_frame_attention. According to the paper, I guess it helps the consistency cross the frames. But SparseCausalAttention2D cannot be found. I don't konw whether lora is necessary, but I think the motion module is supposed to work well without other control components. It seems that I need pay more effort on the config or wait for more unopend details.

Nov 03 '23 07:11 fangsiff

Yes, I agree. I think that the provided motion module looks the same like your results. I tried the author's model weights, the generated animate motion is also slight. I think that the motion module try to realize good consistency. I will try to train it with different configs later.

Nov 03 '23 08:11 tacit0428

hi, have you solved this problem yet? The missing "SparseCausalAttention2D" module? I have just been reading the source code. And when I was reading the attention.py file which can be found in this link, I found that "SparseCausalAttention2D" had not been defined or been imported. It's really weird. if unet_use_cross_frame_attention: self.attn1 = SparseCausalAttention2D( query_dim=dim, heads=num_attention_heads, dim_head=attention_head_dim, dropout=dropout, bias=attention_bias, cross_attention_dim=cross_attention_dim if only_cross_attention else None, upcast_attention=upcast_attention, ) I'm not sure if it's because I'm so bad at reading code. It looks like the class "SparseCausalAttention2D" is missing which should be defined in the attention.py.