benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

Persistent version of Flash Attention

Open manman-ren opened this issue 6 months ago • 0 comments

Added two more variants: triton_tutorial_flash_v2_persistent and triton_tutorial_flash_v2_persistent_tma The variants handle non-causal only. For causal, it has 2 invocations to attn_fwd_inner, which means we will have an outerloop and 2 inner loops for ... # persistent loop for ... for ... It is not clear how to flatten it into a 1D loop.

manman-ren avatar Aug 02 '24 18:08 manman-ren