Tri Dao

Results 639 comments of Tri Dao

If your padding tokens is only on Q and not on K & V, you can just pretend those are legit tokens and don't need seqused_q right? Then the output...

If you have a kernel that zeros out the padding tokens (sth like `out[padding_indices, :, :] = 0.0`) then you could apply that to the output and the incoming gradient...

Yes there's quite a bit of perf difference. I'd recommend 12.8+.

Just want to echo this, would make it much easier than just reading the SASS

Hdim 256 isn’t currently supported on sm100. We might get to it later

Prob 1-2 months. We're focusing on hdim 128 and hdim 192-128 (deepseek)

It's better to add to existing interface instead of duplicating code

Is `self.tiles_per_page` a compile time constant? If so, we should add it to the `compile_key` in the interface

The implementation is here: https://github.com/Dao-AILab/flash-attention/blob/4d9ba4f018cca5c8ca6c6f1df08fea75f119b06d/csrc/flash_attn/src/alibi.h#L31 If causal, we add `alibi_slope * column_idx` to each element of the attention scores. If not causal, we add `alibi_slope * |row_idx - col_idx|`. The...