Tri Dao

https://tridao.me

Princeton, NJ Assistant Professor @ Princeton CS, machine learning & systems

Results 435 comments of


                                            Tri Dao

trafficstars

Shall Flash-attn support Gemma-2 soft-capping anytime soon ?

it's supported

Shall Flash-attn support Gemma-2 soft-capping anytime soon ?

Idk anything about HF transformers

How can I install with cuda12.1?

Yea you can just download the wheel compiled with cuda 12.3. Should be compatible.

[Rotary] more varlen rotary function implement

Thanks! Is the formatting by black using line length of [100](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/pyproject.toml)?

[Rotary] more varlen rotary function implement

Sorry I've just been busy. Let me take a look tomorrow.

Flash attention hangs when running an openchat model inside a docker container

It could be any other code that's hanging.

RuntimeError: Triton Error [CUDA]: context is destroyed

Seems like a Triton error. You might have better luck searching their repo issues.

Flash Attention 3 fp8 support 4090?

4090 is Ada, not Hopper

Support AMD ROCm on FlashAttention 2

For those with AMD devices can you help test this PR?

NaN Gradient Issue with Gemma2 Model Training

softcapping is not supported yet in the backward pass

‹
1
2
...
35
36
37
38
39
40
41
42
43
44
›