mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Blazingly fast LLM inference.

Results 186 mistral.rs issues
Sort by recently updated
recently updated
newest added

Fixed a link typo

**Describe the bug** Trying to follow getting started commands on Mac but they doesn't seem to work out of the box: ```sh % cp ./target/release/mistralrs-server . cp: cannot overwrite directory...

documentation
resolved

**Describe the bug** When running the server after a fresh clone and build **Latest commit** commit 092deeec5ed9c45b36df280d6eba2b0632d4f415 **How to reproduce** ```shell git clone https://github.com/EricLBuehler/mistral.rs.git cd mistral.rs cargo run -- -i...

bug

We should distinguish between 2 cases in `api_get_file!`: - 404: read from local - Anything else: propagate error Currently, if the "error" is not 404, we will still attempt reading...

``` ./target/profiling/mistralrs-bench -p 0 -g 64 -r 1 -c 8 gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf ``` Master ![image](https://github.com/EricLBuehler/mistral.rs/assets/12750442/dc6468fe-40f3-498d-ad53-feda993d4761) This PR ![image](https://github.com/EricLBuehler/mistral.rs/assets/12750442/d72ad4f5-7dca-41d9-8789-58fffed9efd5)

This is currently pending on some way to do topk in Candle.

new feature
models

This will allow loading very large models onto the CPU and then applying ISQ onto the device.

new feature
backend
models

new feature
optimization
models

Similar to what was described here https://github.com/huggingface/candle/issues/2108 "When prompts get longer than trivial sizes, the memory usage spikes as the prompt is thrown into one Tensor and sent off to...

new feature
models

Please let us know what model architectures you would like to be added! **Up to date todo list below. Please feel free to contribute any model, a PR without device...

models