Tri Dao comments

Results 250 comments of


                                            Tri Dao

Does flash-attention2 support L40?

Yes

pip install flash-attn always happens ModuleNotFoundError: No module named 'packaging',but actually i have pip install packaging

> `pip install -v flash-attn==2.1.1` doesn't help either Did you try `--no-build-isolation`?

pip install flash-attn always happens ModuleNotFoundError: No module named 'packaging',but actually i have pip install packaging

`pip install packaging` then `pip install flash-attn --no-build-isolation`.

pip install flash-attn always happens ModuleNotFoundError: No module named 'packaging',but actually i have pip install packaging

Yeah I don't have much bandwidth to spend on packaging, python packaging is kind of a mess once you involve CUDA and torch. I just use docker for reproducible environment.

pip install flash-attn always happens ModuleNotFoundError: No module named 'packaging',but actually i have pip install packaging

PyPI has a file size limit of 60MB.

Undefined symbol

It's because of torch version change. nvcr pytorch 23.12 should work with flash-attn v2.4.0.post1 now. If you're using torch-nightly, we currently use torch-nightly 20231106 to compile the CUDA wheel, so...

install the flash-attention

No, m40 is of [Maxwell](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) generation (sm_52). FlashAttention right now supports Ampere (sm_80) and after.

install the flash-attention

> @tridao > > Can you explain why it does not support maxwell? Are there any technical limitations? Because it takes work. Maxwell does not have tensor cores. Someone would...

Feature Request: Fused Linear and Cross-Entropy Loss

I personally don't have cycles for this but we welcome PRs

How Flash attention2 use in Prefix decoder?

That kind of mask is not currently supported.