Tri Dao comments

Results 639 comments of


                                            Tri Dao

[FA3] code to reproduce flash3_fp16_fwd.png?

soon

Does the new flash-attention support ROCm?

Sorry idk much about the ROCm version, you can ask on their repo.

[FA3] Performance confusion

There's a persistent scheduler that's not yet enabled for causal, we'll update it soon.

is fwd_kvcache compatible with torch.compile in 2.7.2post1 ?

Sure, would love to see some PR fixing this

is fwd_kvcache compatible with torch.compile in 2.7.2post1 ?

Can you send a short script to reproduce the speed regression? e.g. with this input, 2.5.9.post1 gets XXX seconds and 2.6.1 gets YYY seconds

BF16 Flash Attention producing incorrect values compared to FP16 Flash Attention on A100

Please compare (flashattn in bf16 - reference attn in fp32) vs (reference attn in bf16 - reference attn in fp32)

comparing HF vs FA2 llama2 models

We have [code](https://github.com/Dao-AILab/flash-attention/blob/6711b3bc40073e7ced2a4c7d8266feec7e6e137f/flash_attn/models/llama.py#L107) to convert weights from Meta and HF to be compatible with the implementation in this repo. Test is [here](https://github.com/Dao-AILab/flash-attention/blob/6711b3bc40073e7ced2a4c7d8266feec7e6e137f/tests/models/test_llama.py#L65) to verify the the models implemented in this...