mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Blazingly fast LLM inference.

Results 186 mistral.rs issues
Sort by recently updated
recently updated
newest added

https://github.com/flame/blis Jeff Diamond, I think from Oracle, did optimizations for ARM in the Blis library. Any plans for supporting that or other BLAS like libraries? CPU inference on Ampere is...

new feature

My installation is based on [c2ff402](https://github.com/EricLBuehler/mistral.rs/commit/c2ff4027824f71d59ebcc6a4aad87f099865a348) ```shell ./mistralrs-server -i plain -m microsoft/Phi-3-small-8k-instruct -a phi3 ``` I run into the following error: ``` Could not get file "tokenizer.json" from API: RequestError(Status(404,...

bug

This reports mistral.rs as being faster than llama.cpp: https://github.com/EricLBuehler/mistral.rs/discussions/612 But I'm seeing much slower speeds for the same prompt/settings. Mistral.rs ```Usage { completion_tokens: 501, prompt_tokens: 28, total_tokens: 529, avg_tok_per_sec: 16.980707,...

optimization

## Describe the bug Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. Even to the...

bug

What is the current status for providing prebuilt for providing python bindings? If prebuilt binary is provided, this would be really beneficial in terms of download/compile time of python bindings....

new feature

## Describe the bug ```bash cargo run --features metal --package mistralrs-server --bin mistralrs-server -- --token-source cache -i plain -m microsoft/Phi-3.5-mini-instruct -a phi3 --dtype bf16 ``` error message ```bash .4800033569336, 64.51000213623047,...

bug

I noticed you guys forked a bunch of controller code from AICI for your constraints. I think you might be interested in https://github.com/microsoft/llguidance - it implements a more general constraint...

new feature

## Describe the bug I'm trying to run mistralrs on a VRAM-constrained system (16 GB VRAM, 64 GB RAM), via the docker image. ```bash ghcr.io/ericlbuehler/mistral.rs:cuda-80-0.3 ``` The arguments for the...

bug

## Describe the bug When using the mistralrs library to process multiple requests in a loop, the blocking_recv call hangs indefinitely after the first iteration. This prevents the code from...

bug

Beam search could be very valuable for non-creative generation.

new feature