flash-attention
flash-attention copied to clipboard
How to fix this "RuntimeError: cu_seqlens_q must have shape (batch_size + 1)"
Can you check if cu_seqlens_q has shape (batch_size + 1)?
I wonder if it's an error with this var?
Can you add a short script to reproduce the error?
I tried to copy the above code here, but it cause an error.
I'm wondering if manually adding a column to batch size is the right solution. I have run it successfully before this error was reported.
I dont know what is going on? Why does this error occur? Is it because my data is wrong?
I solved this problem. I tried lowering transformer 4.35 to 4.33 it works.
I have attempted to utilize the BartFlashAttention class from the Transformers library with BioGPT, because BioGPT's attention mechanism is derived from the BART model's attention module. However, I encountered the following error: RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
I have attempted to utilize the BartFlashAttention class from the Transformers library with BioGPT, because BioGPT's attention mechanism is derived from the BART model's attention module. However, I encountered the following error: RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
Idk how transformers or BioGPT do things, you might get better help if you ask there.
Thank you for your response. I've identified the problem: the transformer creates a causal-like attention mask for BioGPT. However, your FlashAttention expects the attention mask in the shape of [bs, seq_len] and requires a 'causal' parameter. So, I must pass an attention mask similar to what a tokenizer would generate to FlashAttention. Am I correct about the required shape for the attention mask?"
We don't take attention mask currently. You can see the README or function docstring for the required shapes.
was there any final fix for this issue or just downgrading transformer to lower version was the only option?
def _get_unpad_data(attention_mask):
seqlens_in_batch = attention_mask.sum(dim=-1, dtype=torch.int32).flatten()
indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
max_seqlen_in_batch = seqlens_in_batch.max().item()
cu_seqlens = F.pad(torch.cumsum(seqlens_in_batch.flatten(), dim=0, dtype=torch.int32), (1, 0))
return (
indices,
cu_seqlens,
max_seqlen_in_batch,
)
i flatten seqlens_in_batch fixed this bug when use flash_attn_varlen_func
but i see this not faster than use attiontion_mask=None with flash_attn_func