Tri Dao comments

Results 429 comments of


                                            Tri Dao

trafficstars

No Module Named 'torch'

> to fix this problem, maybe adding torch dependency into [pyproject.toml](https://github.com/HazyResearch/flash-attention/blob/main/pyproject.toml) can help We had torch in the dependency in 1.0.5, but for some users it would download a new...

RuntimeError: Error compiling objects for extension

We have pre-built CUDA wheels now that setup.py will automatically download.

Flash Attention 2.0 doesn't work: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv

Thanks for the report. I saw just this error on nvcr 23.06 as well. nvcr 23.07 should work, can you try? The error is due to pytorch interface changing between...

Flash Attention 2.0 doesn't work: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv

Oh it's a low-level change in error handling. Pytorch [added](https://github.com/pytorch/pytorch/commit/0ec4646588ce8c2ef1b7edcec0c7787da7a72f38) this "throw_data_ptr_access_error" function in May 11. nvcr 23.06 uses pytorch version on May 2 and nvcr 23.07 uses pytorch version...

Flash Attention 2.0 doesn't work: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv

You can compile from source with `FLASH_ATTENTION_FORCE_BUILD=TRUE`: `FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn`.

Flash Attention 2.0 doesn't work: undefined symbol: _ZNK3c1010TensorImpl27throw_data_ptr_access_errorEv

For nvcr 23.12 and 24.01 please use flash-attn 2.5.1.post1

Flash Attention 2 Output not Equal to PyTorch scaled_dot_product_attention in MusicGen Inference

The right comparison is (FlashAttention in fp16/bf16 - standard attention in fp32) vs (standard attention in fp16/bf16 - standard attention in fp32). What are the two differences ^^^ in your...

Flash Attention 2 Output not Equal to PyTorch scaled_dot_product_attention in MusicGen Inference

Floating point operations are not associative. Changing the order of the operations will change the output, up to numerical precision. Example ``` In [1]: import torch In [2]: a =...

Flash Attention 2 Output not Equal to PyTorch scaled_dot_product_attention in MusicGen Inference

With torch.float32 F.scaled_dot_product_attention does not call FlashAttention (only implemented for fp16 and bf16). You can ask in the Pytorch github.

Relative postitions

It is not supported yet.