mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

[Feature] Implementation of multi-gpu KV cache (RingAttention)

Open joshpopelka20 opened this issue 7 months ago • 19 comments

I'll work through adding it to quantized llama first, as I know that architecture the most. Link to the paper: https://arxiv.org/abs/2310.01889

joshpopelka20 avatar Jul 22 '24 19:07 joshpopelka20