flash-attention [cute, bwd, sm100] support for head

Hi, thanks for the valuable contribution of flash_attn_cute

I have tested flash_attn_cute on B200 and found that it can support head_dim=128 but no head_dim = 64.

Will you support the head_dim=64?

Nov 11 '25 09:11 XiaomingXu1995

Hi @XiaomingXu1995 , I'm also trying out flash_attn_cute on the same chip, but facing issues while setup as the library isn't able to file the installation path. Can you please share the installation step, that would be really helpful

Nov 12 '25 05:11 depksingh

I just run pip install --no-build-isolation -e . in the flash_attn/cute directory on conda environment.

And make sure there is no other version of flash_attn (e.g. flash_attn 2) in the same conda environment.

Nov 12 '25 10:11 XiaomingXu1995

I just run pip install --no-build-isolation -e . in the flash_attn/cute directory on conda environment.

And make sure there is no other version of flash_attn (e.g. flash_attn 2) in the same conda environment.

Thank you. so I was trying to use WAN model, which inherently uses FA2, will I need to update the model inferences for FA2 to direct to the above library? Is my understand correct?

Nov 12 '25 10:11 depksingh

[cute, bwd, sm100] support for head_dim = 64