mistral.rs
mistral.rs copied to clipboard

Published 20 hours ago •

Reame
Issues

[Feature] Implementation of multi-gpu KV cache (RingAttention)

Open joshpopelka20 opened this issue 7 months ago • 19 comments

I'll work through adding it to quantized llama first, as I know that architecture the most. Link to the paper: https://arxiv.org/abs/2310.01889

Jul 22 '24 19:07 joshpopelka20