mistral.rs
mistral.rs copied to clipboard
Sliding window models do not properly slice KV cache
Describe the bug This affects models which use sliding window attention, but only when the sequence length is great enough (seq_len > sliding_window) to need the sliding window. This will be fixed in #244.
Latest commit 4505a5e
I guess this could also be realated to phi-3 producing gibberish output on long input sequences?
Yes, this is the reason.