flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

why flash can't accelerate on A40 machine?

Open zhangxihou opened this issue 1 year ago • 1 comments
trafficstars

Hi i tested flash-attn operation on an A40 machine and it showed no improvement on trainning speed , Moreover , I printed out the time cost of self-attention calculation part between two models . One model used normal attention , the other used flash-attn. Aart from that ,these two models are consistent. So can flash-atten only accelerate trainning speed on A100, but A40 can't? by the way,flash-atten did reduce the cuda memory usage! paste the env and other configurations: work: speech recognition env: cuda12.1 torch2.1.2 flash-atten 2.5.2
models: 11*conformers operation used: flash_attn_qkvpacked_func

zhangxihou avatar Apr 18 '24 08:04 zhangxihou

Please benchmark just the attention operation

tridao avatar Apr 18 '24 08:04 tridao