Tri Dao

Results 429 comments of Tri Dao
trafficstars

Can you save the tensors being passed to flash_attn_cuda.varlen_bwd and send them to me? Otherwise it would be very hard to debug? And can you print out the value of...

Would be hard for me to debug if I can't reproduce it. You can do try catch to hopefully save the tensors.

I haven't had much bandwidth to work on Turing.

I see. I'll try to find some time this weekend for this. Is the usage on T4 just inference (forward pass only)?

> Hi, has there been any update on this? No I haven't had much time

Nope I've had no bandwidth

Please benchmark just the attention operation

Try flash-attn 2.5.1 on nvcr 23.12 or 24.01.

Can you try `python -m pip install flash-attn`? It's possible that `pip` and `python -m pip` refer to different environments. Getting the dependencies right for all setup is hard. We...

I don't know a right solution that works for all setups, happy to hear suggestions. We recommend the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia, which has all the required tools to install...