llama2.c icon indicating copy to clipboard operation
llama2.c copied to clipboard

Question: Sliding window attention

Open stellanhaglund opened this issue 1 year ago • 3 comments

Are there any plans of trying out sliding window attention like mistral on this repo, or is that more appropriate for a separate fork?

Also if anyone has tried anything with this I’m really interested in that.

stellanhaglund avatar Oct 08 '23 18:10 stellanhaglund

The new flash-attention has sliding window build in, however it doesnt stuck with compiling the model. So it is extremely easy to try it as it is but you will end up with slow training. there is an other repo called TinyLlama, where sliding window is an option, but my feeling as it is, is that is slower than this repo with the compile=True. It will be nice if they can implemented it though. I agree with you.

artnoage avatar Oct 23 '23 09:10 artnoage

I'm not only interested in the performance side, I'm also interested in if there's any noticeable difference in the output with sliding window attention. It seems to benefit Mistral a lot.

stellanhaglund avatar Oct 23 '23 10:10 stellanhaglund

Mistral Is more data secret sauce than architecture change, it may only be slightly better

VatsaDev avatar Oct 30 '23 19:10 VatsaDev