flash-attention [QST] flash_attn2: why tOrVt is no swizzle ?

[QST] flash_attn2: why tOrVt is no swizzle ?

Open itsliupeng opened this issue 1 year ago • 1 comments

trafficstars

In the code at this link, the line reads:

Tensor tOrVt = thr_mma.partition_fragment_B(sVtNoSwizzle);

Could you explain why sVtNoSwizzle is used here instead of simply using sVt? Thanks in advance for your clarification!

Jul 30 '24 10:07 itsliupeng

Idk tbh. Result was wrong without NoSwizzle.

Jul 30 '24 16:07 tridao

On FA 2.6.3, using sVt instead of sVtNoSwizzle generates correct results for my token decoding app (I'm using cutlass@19f515 - maybe this was a cutlass bug back then?).

btw, even using sVtNoSwizzle, profiler did not report any smem bank conflicts.

Nov 11 '24 23:11 SimpleTheoryOfTypes

Ha this is great to know, thank you!

Nov 12 '24 06:11 tridao

flash-attention flash-attention copied to clipboard

[QST] flash_attn2: why tOrVt is no swizzle ?

flash-attention
flash-attention copied to clipboard