Tri Dao
Tri Dao
Can you check if cu_seqlens_q has shape (batch_size + 1)?
Can you add a short script to reproduce the error?
> I have attempted to utilize the BartFlashAttention class from the Transformers library with BioGPT, because BioGPT's attention mechanism is derived from the BART model's attention module. However, I encountered...
We don't take attention mask currently. You can see the README or function docstring for the required shapes.
Can you try with `pip install --no-build-isolation flash-attn`? This code is written as a Pytorch extension so we need Pytorch to compile.
I'm not familiar with accelerate or how `transformers` uses FlashAttention, you'd probably get better help asking on those repos.
> this doesn't work for me again, might be because I have. cc @tridao not sure how relevant this is The q, k, v need to be on 'cuda' and...
> I am getting a similar issue without training with torch nightly on Llama so can confirm something's wrong! Might be on our side, but as far as I tested...
We recommend the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia, which has all the required tools to install FlashAttention.
I had to remove pyproject.toml for now since I couldn't find a way to add torch as a build dependencies that work for everyone. Hopefully installation would work for the...