Tri Dao comments

Results 429 comments of


                                            Tri Dao

trafficstars

`2.5.0` has an issue accessing memory illegally during backward

Can you save the tensors being passed to flash_attn_cuda.varlen_bwd and send them to me? Otherwise it would be very hard to debug? And can you print out the value of...

`2.5.0` has an issue accessing memory illegally during backward

Would be hard for me to debug if I can't reproduce it. You can do try catch to hopefully save the tensors.

Turing architecture support

I haven't had much bandwidth to work on Turing.

Turing architecture support

I see. I'll try to find some time this weekend for this. Is the usage on T4 just inference (forward pass only)?

Turing architecture support

> Hi, has there been any update on this? No I haven't had much time

why flash can't accelerate on A40 machine?

Please benchmark just the attention operation

Error with Pytoch containers

Try flash-attn 2.5.1 on nvcr 23.12 or 24.01.

No Module Named 'torch'

Can you try `python -m pip install flash-attn`? It's possible that `pip` and `python -m pip` refer to different environments. Getting the dependencies right for all setup is hard. We...

I don't know a right solution that works for all setups, happy to hear suggestions. We recommend the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia, which has all the required tools to install...