mistral.rs
mistral.rs copied to clipboard
Blazingly fast LLM inference.
**Describe the bug** Regardless of installing mistralrs-metal (or mistralrs-accelerate) model runs on CPU. This is indicated by log during running and takes exactly the same amount of time as if...
Hello! I had a thought. To minimize constant load for tasks that occur infrequently, is there a way to keep the Docker container running with the HTTP server, but only...
It would be fantastic if mistral.rs implement an exllamav2 backend to allow loading exl2 models. I know you're planning this, but I saw there wasn't an open feature request to...
https://huggingface.co/openvla/openvla-7b
Currently, AnyMoE only support homogenous expert types. This restricts the user to using only fine-tuned or only LoRA adapter experts. Implementing heterogeneous expert support will enable, for example, mixing fine-tuned...
Currently, the tight loop in `Engine` causes very high single core CPU usage when idle. This is also not great because this is long-running blocking code running inside of an...
## Minimum reproducible example `cargo build --release --features metal` ## Error Compiling candle-core v0.7.2 (https://github.com/EricLBuehler/candle.git?rev=60eb251#60eb251f) error[E0004]: non-exhaustive patterns: `DType::F8E4M3` not covered --> /Users/viacheslav.maslov/.cargo/git/checkouts/candle-c6a149c3b35a488f/60eb251/candle-core/src/metal_backend/mod.rs:96:15 | 96 | match self.dtype { |...