Tri Dao
Tri Dao
Which GPU? It should work with Hopper and Blackwell (B200). Ampere and Blackwell Geforce need a bit of fixing.
I just added a check so make sure it only runs with Hopper and Blackwell (B200).
Oh right varlen on B200 doesn't work yet (need some minor fix). You can use `flash_attn_func`
fwd varlen works on B200 now
Yes we're working on that.
Like a month
late Aug / early Sept
bwd is there now, might need a bit more testing but mostly works and it's quite fast
Ya varlen isn't hard, should be within a week
Can you measure the kernel time, to avoid other confounding factors (graph break etc)?