OpenDiT About FlashAttention

About FlashAttention

Open ZekaiGalaxy opened this issue 11 months ago • 4 comments

Thank you for your great work :) !

Here is my question: I tried to follow the instructions but failed in the flash-attention related steps. According to this issue, V100 are not supported.

So I wonder the efficiency gain without the flash-attention module, or are there any methods to surpass the above issue and achieve comparable performance on V100s?

Thank you!

Feb 29 '24 09:02 ZekaiGalaxy

memory efficient attention from xformers may be a good choice

Feb 29 '24 10:02 oahzxl

Can I ask you why you are still using the external flash attention? torch.nn.functional.scaled_dot_product_attention has already a flashattention-2 implementation: https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html

Mar 08 '24 02:03 bhack

its implemtation sometimes is slower than flashattn on device < H100

Mar 21 '24 05:03 oahzxl

Probably we could monitor this https://github.com/pytorch/pytorch/pull/120642

Mar 21 '24 11:03 bhack

OpenDiT OpenDiT copied to clipboard

About FlashAttention

OpenDiT
OpenDiT copied to clipboard