Tri Dao
Tri Dao
Yeah then idk how to fix.
I've no experience with Windows. It's only tested on Linux, so WSL would probably work.
4060 should also work.
You can just change the python interface (https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_interface.py) to set k_grad and v_grad to None and see if that works.
Yeah i think you might be right, it's a bug.
If there's attn mask pytorch does not dispatch to FA2 kernel, rather the kernel from xformers.
> https://github.com/ROCmSoftwarePlatform/flash-attention I think that's a fork maintained by AMD folks and it's not meant to be merged.
Yup, it's mentioned in the README ``` FlashAttention-2 currently supports: Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is...
Unfortunately I've had no bandwidth to work on this. We welcome contributions.
As the error message says, there's no `flash_attn_varlen_func_with_kvcache`.