Tri Dao comments

Results 250 comments of


                                            Tri Dao

build fails on cuda 12.2 system

Yeah then idk how to fix.

Does flash-attention support RTX 4070ti?

I've no experience with Windows. It's only tested on Linux, so WSL would probably work.

Does flash-attention support RTX 4070ti?

4060 should also work.

Is there a way to use flash attention and selectively finetune only q projection layer, leaving k and v projection layers frozen?

You can just change the python interface (https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_interface.py) to set k_grad and v_grad to None and see if that works.

Bug when window_size_right > max_seq_k and seq_q > seq_k ?

Yeah i think you might be right, it's a bug.

Are there any plans for supporting an explicit attention mask?

If there's attn mask pytorch does not dispatch to FA2 kernel, rather the kernel from xformers.

add support for AMD / ROCm / HIP

> https://github.com/ROCmSoftwarePlatform/flash-attention I think that's a fork maintained by AMD folks and it's not meant to be merged.

support for Quadro RTX 8000?

Yup, it's mentioned in the README ``` FlashAttention-2 currently supports: Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is...

support for Quadro RTX 8000?

Unfortunately I've had no bandwidth to work on this. We welcome contributions.

ImportError: cannot import name 'flash_attn_varlen_func_with_kvcache' from 'flash_attn' (/python3.10/site-packages/flash_attn/init.py)

As the error message says, there's no `flash_attn_varlen_func_with_kvcache`.