Tri Dao
Tri Dao
> `pip install -v flash-attn==2.1.1` doesn't help either Did you try `--no-build-isolation`?
`pip install packaging` then `pip install flash-attn --no-build-isolation`.
Yeah I don't have much bandwidth to spend on packaging, python packaging is kind of a mess once you involve CUDA and torch. I just use docker for reproducible environment.
PyPI has a file size limit of 60MB.
It's because of torch version change. nvcr pytorch 23.12 should work with flash-attn v2.4.0.post1 now. If you're using torch-nightly, we currently use torch-nightly 20231106 to compile the CUDA wheel, so...
No, m40 is of [Maxwell](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) generation (sm_52). FlashAttention right now supports Ampere (sm_80) and after.
> @tridao > > Can you explain why it does not support maxwell? Are there any technical limitations? Because it takes work. Maxwell does not have tensor cores. Someone would...
I personally don't have cycles for this but we welcome PRs
That kind of mask is not currently supported.