Tri Dao

Results 429 comments of Tri Dao
trafficstars

> to fix this problem, maybe adding torch dependency into [pyproject.toml](https://github.com/HazyResearch/flash-attention/blob/main/pyproject.toml) can help We had torch in the dependency in 1.0.5, but for some users it would download a new...

We have pre-built CUDA wheels now that setup.py will automatically download.

Thanks for the report. I saw just this error on nvcr 23.06 as well. nvcr 23.07 should work, can you try? The error is due to pytorch interface changing between...

Oh it's a low-level change in error handling. Pytorch [added](https://github.com/pytorch/pytorch/commit/0ec4646588ce8c2ef1b7edcec0c7787da7a72f38) this "throw_data_ptr_access_error" function in May 11. nvcr 23.06 uses pytorch version on May 2 and nvcr 23.07 uses pytorch version...

You can compile from source with `FLASH_ATTENTION_FORCE_BUILD=TRUE`: `FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn`.

The right comparison is (FlashAttention in fp16/bf16 - standard attention in fp32) vs (standard attention in fp16/bf16 - standard attention in fp32). What are the two differences ^^^ in your...

Floating point operations are not associative. Changing the order of the operations will change the output, up to numerical precision. Example ``` In [1]: import torch In [2]: a =...

With torch.float32 F.scaled_dot_product_attention does not call FlashAttention (only implemented for fp16 and bf16). You can ask in the Pytorch github.

It is not supported yet.