AnimateDiff icon indicating copy to clipboard operation
AnimateDiff copied to clipboard

About training config: unet_use_cross_frame_attention

Open fangsiff opened this issue 1 year ago • 10 comments

I'm going to train the motion module myself. After hard preparation and debugging work, my training can finally be executed successfully, but I can't confirm the results. The output motion module cannot work well, the result images are almost the same. It's hard to recognize moving things.

Besides, I found that when I turn on unet_use_cross_frame_attention, the module 'SparseCausalAttention2D' cannot be found. Could you please help me with trainning config and missing 'SparseCausalAttention2D' module?

fangsiff avatar Oct 30 '23 08:10 fangsiff

Hi, can you tell me your learning rate and train dataset? I also tried to train the motion module on a small dataset. I set batch size =4 and lr=3e-5, but the train result didn't look good and the train loss didn't decrease.

tacit0428 avatar Nov 02 '23 11:11 tacit0428

@tacit0428 Hi, I used the dataset from https://github.com/m-bain/webvid. I just use 30 videos of results_2M_train.csv, which contanins the keywords: smile, camera, look, (man | woman | girl |boy) . I also confused with the configuration, the follow is my training config:

  1. learning_rate: 8.0e-05
  2. num_examples: 30
  3. max_train_steps: 8000

I also cannot get good motion module yet. Besides, my trained model cannot be used for inference v2, therefore I think the training scripts util now are suitable for mm-14 rather than mm-15 and mm-15_v2. I hope it helps.

fangsiff avatar Nov 03 '23 07:11 fangsiff

It seems good. I can use inference v2 and mm-15_v2. But I can't turn on unet_use_cross_frame_attention either. I don't know if it's essential. Did you turn on this parameter in your training? Besides, I found that the training result is not good when I use stable diffusion without any lora.

tacit0428 avatar Nov 03 '23 07:11 tacit0428

My result gif is lack of animate motion, the movments are barely to be noticed. I don't turn on the unet_use_cross_frame_attention. According to the paper, I guess it helps the consistency cross the frames. But SparseCausalAttention2D cannot be found. I don't konw whether lora is necessary, but I think the motion module is supposed to work well without other control components. It seems that I need pay more effort on the config or wait for more unopend details.

fangsiff avatar Nov 03 '23 07:11 fangsiff

Yes, I agree. I think that the provided motion module looks the same like your results. I tried the author's model weights, the generated animate motion is also slight. I think that the motion module try to realize good consistency. I will try to train it with different configs later.

tacit0428 avatar Nov 03 '23 08:11 tacit0428

hi, have you solved this problem yet? The missing "SparseCausalAttention2D" module? I have just been reading the source code. And when I was reading the attention.py file which can be found in this link, I found that "SparseCausalAttention2D" had not been defined or been imported. It's really weird. if unet_use_cross_frame_attention: self.attn1 = SparseCausalAttention2D( query_dim=dim, heads=num_attention_heads, dim_head=attention_head_dim, dropout=dropout, bias=attention_bias, cross_attention_dim=cross_attention_dim if only_cross_attention else None, upcast_attention=upcast_attention, ) I'm not sure if it's because I'm so bad at reading code. It looks like the class "SparseCausalAttention2D" is missing which should be defined in the attention.py.

liuchangzong avatar Nov 14 '23 13:11 liuchangzong

@liuchangzong No, not yet. The implementation of this class is Unpublished

fangsiff avatar Nov 15 '23 03:11 fangsiff

ok,thanks a lot

liuchangzong avatar Nov 15 '23 06:11 liuchangzong

Same problem, motion results is poor.

ssvicnent avatar Nov 15 '23 12:11 ssvicnent

SparseCausalAttention2D is Unpublished?

caojiehui avatar Feb 28 '24 02:02 caojiehui