flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

[cute, bwd, sm100] support for head_dim = 64

Open XiaomingXu1995 opened this issue 1 month ago • 3 comments

Hi, thanks for the valuable contribution of flash_attn_cute

I have tested flash_attn_cute on B200 and found that it can support head_dim=128 but no head_dim = 64.

Will you support the head_dim=64?

XiaomingXu1995 avatar Nov 11 '25 09:11 XiaomingXu1995

Hi @XiaomingXu1995 , I'm also trying out flash_attn_cute on the same chip, but facing issues while setup as the library isn't able to file the installation path. Can you please share the installation step, that would be really helpful

depksingh avatar Nov 12 '25 05:11 depksingh

I just run pip install --no-build-isolation -e . in the flash_attn/cute directory on conda environment.

And make sure there is no other version of flash_attn (e.g. flash_attn 2) in the same conda environment.

XiaomingXu1995 avatar Nov 12 '25 10:11 XiaomingXu1995

I just run pip install --no-build-isolation -e . in the flash_attn/cute directory on conda environment.

And make sure there is no other version of flash_attn (e.g. flash_attn 2) in the same conda environment.

Thank you. so I was trying to use WAN model, which inherently uses FA2, will I need to update the model inferences for FA2 to direct to the above library? Is my understand correct?

depksingh avatar Nov 12 '25 10:11 depksingh