OpenDiT
OpenDiT copied to clipboard
About FlashAttention
Thank you for your great work :) !
Here is my question: I tried to follow the instructions but failed in the flash-attention related steps. According to this issue, V100 are not supported.
So I wonder the efficiency gain without the flash-attention module, or are there any methods to surpass the above issue and achieve comparable performance on V100s?
Thank you!
memory efficient attention from xformers may be a good choice
Can I ask you why you are still using the external flash attention?
torch.nn.functional.scaled_dot_product_attention
has already a flashattention-2 implementation:
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html
its implemtation sometimes is slower than flashattn on device < H100
Probably we could monitor this https://github.com/pytorch/pytorch/pull/120642