mistral.rs issues

Python metal package runs on CPU

2

**Describe the bug** Regardless of installing mistralrs-metal (or mistralrs-accelerate) model runs on CPU. This is indicated by log during running and takes exactly the same amount of time as if...

KaQuMiQ

bug

[FEATURE REQUEST] Dynamic Model Loading and Unloading for Efficient VRAM Management

3

Hello! I had a thought. To minimize constant load for tasks that occur infrequently, is there a way to keep the Docker container running with the HTTP server, but only...

Qualzz

new feature

backend

Feature request: Exllamav2 (exl2) backend

8

It would be fantastic if mistral.rs implement an exllamav2 backend to allow loading exl2 models. I know you're planning this, but I saw there wasn't an open feature request to...

sammcj

new feature

backend

models

Add profiling support for paged attn

1

EricLBuehler

Add the OpenVLA model

2

https://huggingface.co/openvla/openvla-7b

EricLBuehler

Extending AnyMoE to support heterogeneous expert types

Currently, AnyMoE only support homogenous expert types. This restricts the user to using only fine-tuned or only LoRA adapter experts. Implementing heterogeneous expert support will enable, for example, mixing fine-tuned...

EricLBuehler

new feature

models

Reduce CPU usage when idle

4

Currently, the tight loop in `Engine` causes very high single core CPU usage when idle. This is also not great because this is long-running blocking code running inside of an...

scottwey

Fix Metal F8 build errors

1

EricLBuehler

Build --features metal falis with "pattern `DType::F8E4M3` not covered"

1

## Minimum reproducible example `cargo build --release --features metal` ## Error Compiling candle-core v0.7.2 (https://github.com/EricLBuehler/candle.git?rev=60eb251#60eb251f) error[E0004]: non-exhaustive patterns: `DType::F8E4M3` not covered --> /Users/viacheslav.maslov/.cargo/git/checkouts/candle-c6a149c3b35a488f/60eb251/candle-core/src/metal_backend/mod.rs:96:15 | 96 | match self.dtype { |...

maslovw

bug

build

FP8 Compressed KV cache

1

EricLBuehler

mistral.rs
mistral.rs copied to clipboard

Metadata

Python metal package runs on CPU

[FEATURE REQUEST] Dynamic Model Loading and Unloading for Efficient VRAM Management

Feature request: Exllamav2 (exl2) backend

Add profiling support for paged attn

Add the OpenVLA model

Extending AnyMoE to support heterogeneous expert types

Reduce CPU usage when idle

Fix Metal F8 build errors

Build --features metal falis with "pattern `DType::F8E4M3` not covered"

FP8 Compressed KV cache

← Metadata

Owner

Metadata

mistral.rs mistral.rs copied to clipboard

Metadata

← Metadata

Owner

Metadata

mistral.rs
mistral.rs copied to clipboard