mistral.rs
mistral.rs copied to clipboard
Blazingly fast LLM inference.
## Describe the bug Log output for building with --features metal: `error[E0425]: cannot find value `devices` in this scope --> mistralrs-core/src/pipeline/isq.rs:194:26 | 194 | .zip(devices) | ^^^^^^^ help: a local...
@p-e-w, could you please give the implementation a quick check? I'm not sure if you are familiar with Rust, but I ported the algorithm from the oobabooga implemenation you linked....
## Describe the bug Have a look :-) https://github.com/user-attachments/assets/321dbb21-2403-4330-9ce1-091902298888 ## Latest commit or version 0.22 MBP M3 Max
Currently, our messages API is clunky as we need to support the older OpenAI format as well as the new, multimodal format (for Idefics and Llava). This is exposed in...
How's the M1 performance compare with llama.cpp or ollama?
I have a 32 core AMD CPU and no GP. mistral.rs will only use two of the cores. 2 cores is a bit less. Is it possible to allow to...
Hi, I'm wondering if you have any plans regarding kv compression methods like SnapKV and PyramidKV. These methods can reduce the use of memory for KV cache, hence improving availability...
This is the start of the RingAttention code. The changes so far have been to create multiple KV caches (if multiple num_devices) and to try to create separate chunks.
i'm looking for some production ready LLM API that I can use from rust as a lib in https://github.com/louis030195/screen-pipe would it be possible to provide some abstraction like ```rs let...