mistral.rs issues

Quantized: Use cublas for prompt

8

This ``` $ ./target/profiling/mistralrs-bench -r 5 -c 1,2,4 gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m The Bloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf 2024-04-28T05:58:00.751771Z INFO mistralrs_bench: avx: true, neon: false, simd128: false, f16c: true 2024-04-28T05:58:00.751790Z INFO mistralrs_bench:...

lucasavila00

Fix timings of completion, add timing of sampling back

2

Now that we're sampling fully in CPU, we should not merge the sampling timings into completion timings. This will likely show an improvement on `mistralrs-bench`'s tg test. Notice `llama-bench` selects...

lucasavila00

new feature

Mistralrs-bench: do warmup run

Llama.cpp does it, will help make comparisons fair

lucasavila00

new feature

Implement Speculative Decoding

5

Speculative decoding: https://arxiv.org/pdf/2211.17192 This will refactor the pipeline structure to make the sampling process more abstracted. Additionally, it will also abstract the scheduling and kv cache management. # Restriction -...

EricLBuehler

new feature

backend

models

Source bos, eos tokens from generation_config.json

1

Also enable logging for pyo3 bindings.

EricLBuehler

Sliding window for phi3

1

EricLBuehler

Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"

2

**Describe the bug** Running a docker build seems to fail with the error `failed to read /mistralrs/mistralrs-bench/Cargo.toml` ``` [+] Building 2.0s (18/20) docker:default => CACHED [mistralrs internal] load git source...

sammcj

bug

Sliding window models do not properly slice KV cache

2

**Describe the bug** This affects models which use sliding window attention, but only when the sequence length is great enough (seq_len > sliding_window) to need the sliding window. This will...

EricLBuehler

bug

Fix docker images

1

Fixes #247 Since we now depend on `pyo3` in `core` we need to include `libpython` in our runtime container. Maybe we could put this `pyo3` dependency behind a feature flag...

LLukas22

Add automatic pypi upload and docker build on release

5

EricLBuehler

mistral.rs
mistral.rs copied to clipboard

Metadata

Quantized: Use cublas for prompt

Fix timings of completion, add timing of sampling back

Mistralrs-bench: do warmup run

Implement Speculative Decoding

Source bos, eos tokens from generation_config.json

Sliding window for phi3

Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"

Sliding window models do not properly slice KV cache

Fix docker images

Add automatic pypi upload and docker build on release

← Metadata

Owner

Metadata

mistral.rs mistral.rs copied to clipboard

Metadata

← Metadata

Owner

Metadata

mistral.rs
mistral.rs copied to clipboard