lmdeploy
lmdeploy copied to clipboard
Support for Flash Attention 3 for Ampere, Ada, and Hopper in LMDeploy
Flash Attention 3 now works with these platforms, is it easily possible for LMDeploy team to implement this? @lvhan028
https://github.com/Dao-AILab/flash-attention/issues/1049#issuecomment-2695283567
@lzhangzz Hi,Is there a plan to implement FA3 on the turbomind engine? Thanks!
@snippetzero Likely in May.