[cute, bwd, sm100] support for head_dim = 64
Hi, thanks for the valuable contribution of flash_attn_cute
I have tested flash_attn_cute on B200 and found that it can support head_dim=128 but no head_dim = 64.
Will you support the head_dim=64?
Hi @XiaomingXu1995 , I'm also trying out flash_attn_cute on the same chip, but facing issues while setup as the library isn't able to file the installation path. Can you please share the installation step, that would be really helpful
I just run pip install --no-build-isolation -e . in the flash_attn/cute directory on conda environment.
And make sure there is no other version of flash_attn (e.g. flash_attn 2) in the same conda environment.
I just run
pip install --no-build-isolation -e .in theflash_attn/cutedirectory on conda environment.And make sure there is no other version of flash_attn (e.g. flash_attn 2) in the same conda environment.
Thank you. so I was trying to use WAN model, which inherently uses FA2, will I need to update the model inferences for FA2 to direct to the above library? Is my understand correct?