Tri Dao

Results 280 comments of Tri Dao

Can you check if cu_seqlens_q has shape (batch_size + 1)?

> I have attempted to utilize the BartFlashAttention class from the Transformers library with BioGPT, because BioGPT's attention mechanism is derived from the BART model's attention module. However, I encountered...

We don't take attention mask currently. You can see the README or function docstring for the required shapes.

Can you try with `pip install --no-build-isolation flash-attn`? This code is written as a Pytorch extension so we need Pytorch to compile.

I'm not familiar with accelerate or how `transformers` uses FlashAttention, you'd probably get better help asking on those repos.

> this doesn't work for me again, might be because I have. cc @tridao not sure how relevant this is The q, k, v need to be on 'cuda' and...

> I am getting a similar issue without training with torch nightly on Llama so can confirm something's wrong! Might be on our side, but as far as I tested...

We recommend the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia, which has all the required tools to install FlashAttention.

I had to remove pyproject.toml for now since I couldn't find a way to add torch as a build dependencies that work for everyone. Hopefully installation would work for the...