Tri Dao comments

Results 250 comments of


                                            Tri Dao

support attentions in AlphaFold2

I just haven't had time to review and merge it (it's a pretty big change). Still trying to figure out a good way to support both mask and bias without...

Unable to import flash_attn_cuda

Usually that's because of some mismatch between pytorch cuda version and the nvcc version used to compiled FlashAttention. If you're certain there's no mismatch then idk what could be wrong....

Unable to import flash_attn_cuda

We recommend using the Nvidia's pytorch [container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch), which has all the right environment setup.

Unable to import flash_attn_cuda

Can you try the latest version (1.0.6)?

Has anyone successfully compiled this on ARM Linux (aarch64)?

Does either #757 or #724 work for you?

I thought nvcc wasnt required @ 2.0.7 but is again?

We should have prebuilt wheels for this setting (torch 2.0 cuda 11.8) that setup.py automatically downloads, and nvcc should not be necessary. Are you installing from source or from PyPI...

I thought nvcc wasnt required @ 2.0.7 but is again?

I see. The current setup.py might still require nvcc, I'll figure out how to fix later. As a work around for now you can try `FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn --no-build-isolation`

I thought nvcc wasnt required @ 2.0.7 but is again?

Let's keep this open for now.

I thought nvcc wasnt required @ 2.0.7 but is again?

I turned the "raise error" to a warning but looks like it's not enough. Constructing the CUDAExtension with pytorch already requires CUDA_HOME. Let me think about it more.

Monarch Projection Step for rectangular blocks and where the number of blocks != sqrt(input dimension)

Maybe the function you're looking for is block_diag_butterfly_project_einsum_rank. (you can see our tests here that the projection recovers the original factors) https://github.com/HazyResearch/fly/blob/cd624cffeffa7d1579336d26a776405bf0867f36/tests/ops/test_blockdiag_butterfly_einsum.py#L112