Feature: Flash Attention 3
https://research.colfax-intl.com/flashattention-3-fast-and-accurate-attention-with-asynchrony-and-low-precision/
cc @yzh119
I just heard that FlashInfer has achieved a faster and more comprehensive version than Flash Attention 3, amazing! 👍 Looking forward to it!
I just heard that FlashInfer has achieved a faster and more comprehensive version
I don't know we have achieved that, lol. I'm indeed working on using cutlass to create an sm90 version of flashinfer, but fa3's performance is really impressive (and better than my version atm).
flashattention3 is indeed a great work that we should learn from, and yes I'll adopt its pipeline design and accelerate page/sparse attention kernels accordingly.
I don't know we have achieved that, lol. I'm indeed working on using cutlass to create an sm90 version of flashinfer
My expression may not be very accurate, a more accurate way to say it is "under way". 😂