Flash-Attention-Softmax-N
Flash-Attention-Softmax-N copied to clipboard
CUDA and Triton implementations of Flash Attention with SoftmaxN.
Results
2
Flash-Attention-Softmax-N issues
Sort by
recently updated
recently updated
newest added
Thanks for your nick work first! But when I use the flash_attention_n, I found a bug which happened in setting attn_mask from None to attention_mask. How can I fix it?...
I added unit tests for the case when the `attn_mask` arugment of `flash_attention_n` or `slow_attention_n` is not None in an attempt to reproduce #39. My unit tests pass on my...