flash-attention I/O Analysis of BlockSparse FlashAttention

I/O Analysis of BlockSparse FlashAttention

Open j93hahn opened this issue 7 months ago • 0 comments

Why is there an extra $Nd$ term in the I/O analysis of block sparse FA? Section D1, page 25

The paper says that you have to write the output O back to HBM when s is small. I don't understand, isn't this true for dense FA too? And when s is small, that means that most of the blocks are empty, so there's less information to write, meaning this $Nd$ term would surely become insignificant since the output is already initialized to all 0s

At the very least, my understanding is that you need the $Nd$ extra term for dense FA too [unless it has been omitted due to the $N^2d^2$ dominating the I/O complexity].

@tridao or anyone else if you could help clear up my confusions, I would greatly appreciate this

Jul 21 '24 05:07 j93hahn

flash-attention flash-attention copied to clipboard

I/O Analysis of BlockSparse FlashAttention

flash-attention
flash-attention copied to clipboard