llama2.c Question: Sliding window attention

Question: Sliding window attention

Open stellanhaglund opened this issue 1 year ago • 3 comments

Are there any plans of trying out sliding window attention like mistral on this repo, or is that more appropriate for a separate fork?

Also if anyone has tried anything with this I’m really interested in that.

Oct 08 '23 18:10 stellanhaglund

The new flash-attention has sliding window build in, however it doesnt stuck with compiling the model. So it is extremely easy to try it as it is but you will end up with slow training. there is an other repo called TinyLlama, where sliding window is an option, but my feeling as it is, is that is slower than this repo with the compile=True. It will be nice if they can implemented it though. I agree with you.

Oct 23 '23 09:10 artnoage

I'm not only interested in the performance side, I'm also interested in if there's any noticeable difference in the output with sliding window attention. It seems to benefit Mistral a lot.

Oct 23 '23 10:10 stellanhaglund

Mistral Is more data secret sauce than architecture change, it may only be slightly better

Oct 30 '23 19:10 VatsaDev

llama2.c llama2.c copied to clipboard

Question: Sliding window attention

llama2.c
llama2.c copied to clipboard