flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

fp8 not enabled for mha_varlen_fwd

Open goldhuang opened this issue 5 months ago • 0 comments

I created an issue earlier. https://github.com/Dao-AILab/flash-attention/issues/1157.

https://github.com/Dao-AILab/flash-attention/blob/main/hopper/flash_api.cpp#L447.

I think the kernels are unified. Why is fp8 enabled for mha_fwd but not for mha_varlen_fwd? What's the blocker now? I'm willing to help and contribute if it's not coming recently.

Update - I tired to enable fp8 for mha_varlen_fwd and I got CUDA illegal memory access error.

Thanks!

goldhuang avatar Sep 16 '24 22:09 goldhuang