mistral.rs
mistral.rs copied to clipboard
[Feature] Implementation of multi-gpu KV cache (RingAttention)
I'll work through adding it to quantized llama first, as I know that architecture the most. Link to the paper: https://arxiv.org/abs/2310.01889