Support for Flash Attention 3 for Ampere, Ada, and Hopper in LMDeploy

Open radna0 opened this issue 9 months ago • 2 comments

Flash Attention 3 now works with these platforms, is it easily possible for LMDeploy team to implement this? @lvhan028

https://github.com/Dao-AILab/flash-attention/issues/1049#issuecomment-2695283567

Mar 08 '25 16:03 radna0

@lzhangzz Hi，Is there a plan to implement FA3 on the turbomind engine? Thanks！

Apr 24 '25 10:04 snippetzero

@snippetzero Likely in May.

Apr 24 '25 15:04 lzhangzz