mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Blazingly fast LLM inference.

Results 186 mistral.rs issues
Sort by recently updated
recently updated
newest added

This ``` $ ./target/profiling/mistralrs-bench -r 5 -c 1,2,4 gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m The Bloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf 2024-04-28T05:58:00.751771Z INFO mistralrs_bench: avx: true, neon: false, simd128: false, f16c: true 2024-04-28T05:58:00.751790Z INFO mistralrs_bench:...

Now that we're sampling fully in CPU, we should not merge the sampling timings into completion timings. This will likely show an improvement on `mistralrs-bench`'s tg test. Notice `llama-bench` selects...

new feature

Llama.cpp does it, will help make comparisons fair

new feature

Speculative decoding: https://arxiv.org/pdf/2211.17192 This will refactor the pipeline structure to make the sampling process more abstracted. Additionally, it will also abstract the scheduling and kv cache management. # Restriction -...

new feature
backend
models

**Describe the bug** Running a docker build seems to fail with the error `failed to read /mistralrs/mistralrs-bench/Cargo.toml` ``` [+] Building 2.0s (18/20) docker:default => CACHED [mistralrs internal] load git source...

bug

**Describe the bug** This affects models which use sliding window attention, but only when the sequence length is great enough (seq_len > sliding_window) to need the sliding window. This will...

bug

Fixes #247 Since we now depend on `pyo3` in `core` we need to include `libpython` in our runtime container. Maybe we could put this `pyo3` dependency behind a feature flag...