flash-attention
flash-attention copied to clipboard
[QST] flash_attn2: why tOrVt is no swizzle ?
trafficstars
In the code at this link, the line reads:
Tensor tOrVt = thr_mma.partition_fragment_B(sVtNoSwizzle);
Could you explain why sVtNoSwizzle is used here instead of simply using sVt? Thanks in advance for your clarification!
Idk tbh. Result was wrong without NoSwizzle.
On FA 2.6.3, using sVt instead of sVtNoSwizzle generates correct results for my token decoding app (I'm using cutlass@19f515 - maybe this was a cutlass bug back then?).
btw, even using sVtNoSwizzle, profiler did not report any smem bank conflicts.
Ha this is great to know, thank you!