flash-attention
flash-attention copied to clipboard
will the flash attention embed self-extend?
https://github.com/datamllab/LongLM
Hi! We just implemented FlashAttention for self-extend utilizing the window FA supported by flash_attn. In a word, we merge two FA together to get the attention of self-extend. Check https://github.com/datamllab/LongLM/pull/28 for more details! Now, this implementation, at a cost of slight increased the memory occupation and run time, can extend to 10x larger for Llama, Mistral, Gemma and Qwen1.5 in a fine-turning free way.
But still looking forward to the official implementation of such two-parts FlashAttention with a window!