diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Attention in Motion Module of UNetMotionModel

Open huyduong7101 opened this issue 1 year ago • 2 comments

Describe the bug

As far as I know, UNetMotionModel is adopted for AnimateDiff. Hence, I look into the original implementation of AnimateDiff, it is noticed that they use cross-attention in Motion Module (https://github.com/guoyww/AnimateDiff/blob/main/animatediff/models/unet_blocks.py#L382). On the contrary, Diffusers uses self-attention (https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_3d_blocks.py#L1175)

Reproduction

hidden_states = motion_module(
                hidden_states,
                num_frames=num_frames,
                encoder_hidden_states=encoder_hidden_states,
            )[0]

Logs

No response

System Info

Diffusers 0.27.2

Who can help?

@DN6 @yiyixuxu @sayakpaul

huyduong7101 avatar May 17 '24 08:05 huyduong7101

cc @DN6

yiyixuxu avatar May 20 '24 17:05 yiyixuxu

Hi @huyduong7101 Cross Attention support does exist in the original AnimateDiff codebase, however it isn't actually used in the model.

Here cross_attention_dim is set to None https://github.com/guoyww/AnimateDiff/blob/cf80ddeb47b69cf0b16f225800de081d486d7f21/animatediff/models/motion_module.py#L189

because the block types are Temporal_Self https://github.com/guoyww/AnimateDiff/blob/cf80ddeb47b69cf0b16f225800de081d486d7f21/configs/inference/inference-v1.yaml#L13

DN6 avatar May 22 '24 04:05 DN6

Closing this for now. If you have further questions @huyduong7101 please feel free to create a post in the Discussions section.

DN6 avatar Jun 03 '24 04:06 DN6