flashinfer icon indicating copy to clipboard operation
flashinfer copied to clipboard

Feature: Flash Attention 3

Open zhyncs opened this issue 1 year ago • 3 comments

https://research.colfax-intl.com/flashattention-3-fast-and-accurate-attention-with-asynchrony-and-low-precision/

cc @yzh119

zhyncs avatar Jul 12 '24 02:07 zhyncs

I just heard that FlashInfer has achieved a faster and more comprehensive version than Flash Attention 3, amazing! 👍 Looking forward to it!

zhyncs avatar Jul 12 '24 02:07 zhyncs

I just heard that FlashInfer has achieved a faster and more comprehensive version

I don't know we have achieved that, lol. I'm indeed working on using cutlass to create an sm90 version of flashinfer, but fa3's performance is really impressive (and better than my version atm).

flashattention3 is indeed a great work that we should learn from, and yes I'll adopt its pipeline design and accelerate page/sparse attention kernels accordingly.

yzh119 avatar Jul 12 '24 03:07 yzh119

I don't know we have achieved that, lol. I'm indeed working on using cutlass to create an sm90 version of flashinfer

My expression may not be very accurate, a more accurate way to say it is "under way". 😂

zhyncs avatar Jul 12 '24 03:07 zhyncs