mistral.rs Sliding window models do not properly slice KV cache

Sliding window models do not properly slice KV cache

Open EricLBuehler opened this issue 9 months ago • 2 comments

Describe the bug This affects models which use sliding window attention, but only when the sequence length is great enough (seq_len > sliding_window) to need the sliding window. This will be fixed in #244.

Latest commit 4505a5e

Apr 29 '24 02:04 EricLBuehler

I guess this could also be realated to phi-3 producing gibberish output on long input sequences?

Apr 29 '24 08:04 LLukas22

Yes, this is the reason.

Apr 29 '24 08:04 EricLBuehler

mistral.rs mistral.rs copied to clipboard

Sliding window models do not properly slice KV cache

mistral.rs
mistral.rs copied to clipboard