Tri Dao

Results 250 comments of Tri Dao

Yeah then idk how to fix.

I've no experience with Windows. It's only tested on Linux, so WSL would probably work.

You can just change the python interface (https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_interface.py) to set k_grad and v_grad to None and see if that works.

Yeah i think you might be right, it's a bug.

If there's attn mask pytorch does not dispatch to FA2 kernel, rather the kernel from xformers.

> https://github.com/ROCmSoftwarePlatform/flash-attention I think that's a fork maintained by AMD folks and it's not meant to be merged.

Yup, it's mentioned in the README ``` FlashAttention-2 currently supports: Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is...

Unfortunately I've had no bandwidth to work on this. We welcome contributions.