diffusers
diffusers copied to clipboard
Attention in Motion Module of UNetMotionModel
Describe the bug
As far as I know, UNetMotionModel is adopted for AnimateDiff. Hence, I look into the original implementation of AnimateDiff, it is noticed that they use cross-attention in Motion Module (https://github.com/guoyww/AnimateDiff/blob/main/animatediff/models/unet_blocks.py#L382). On the contrary, Diffusers uses self-attention (https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_3d_blocks.py#L1175)
Reproduction
hidden_states = motion_module(
hidden_states,
num_frames=num_frames,
encoder_hidden_states=encoder_hidden_states,
)[0]
Logs
No response
System Info
Diffusers 0.27.2
Who can help?
@DN6 @yiyixuxu @sayakpaul
cc @DN6
Hi @huyduong7101 Cross Attention support does exist in the original AnimateDiff codebase, however it isn't actually used in the model.
Here cross_attention_dim is set to None
https://github.com/guoyww/AnimateDiff/blob/cf80ddeb47b69cf0b16f225800de081d486d7f21/animatediff/models/motion_module.py#L189
because the block types are Temporal_Self
https://github.com/guoyww/AnimateDiff/blob/cf80ddeb47b69cf0b16f225800de081d486d7f21/configs/inference/inference-v1.yaml#L13
Closing this for now. If you have further questions @huyduong7101 please feel free to create a post in the Discussions section.