flash-attention will the flash attention embed self-extend?

will the flash attention embed self-extend?

Open wangyuxin87 opened this issue 11 months ago • 1 comments

https://github.com/datamllab/LongLM

Mar 04 '24 08:03 wangyuxin87

Hi! We just implemented FlashAttention for self-extend utilizing the window FA supported by flash_attn. In a word, we merge two FA together to get the attention of self-extend. Check https://github.com/datamllab/LongLM/pull/28 for more details! Now, this implementation, at a cost of slight increased the memory occupation and run time, can extend to 10x larger for Llama, Mistral, Gemma and Qwen1.5 in a fine-turning free way.

But still looking forward to the official implementation of such two-parts FlashAttention with a window!

Mar 22 '24 07:03 Mooler0410

flash-attention flash-attention copied to clipboard

will the flash attention embed self-extend?

flash-attention
flash-attention copied to clipboard