mistral.rs
mistral.rs copied to clipboard
Blazingly fast LLM inference.
Fixed a link typo
**Describe the bug** Trying to follow getting started commands on Mac but they doesn't seem to work out of the box: ```sh % cp ./target/release/mistralrs-server . cp: cannot overwrite directory...
**Describe the bug** When running the server after a fresh clone and build **Latest commit** commit 092deeec5ed9c45b36df280d6eba2b0632d4f415 **How to reproduce** ```shell git clone https://github.com/EricLBuehler/mistral.rs.git cd mistral.rs cargo run -- -i...
We should distinguish between 2 cases in `api_get_file!`: - 404: read from local - Anything else: propagate error Currently, if the "error" is not 404, we will still attempt reading...
``` ./target/profiling/mistralrs-bench -p 0 -g 64 -r 1 -c 8 gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf ``` Master  This PR 
This is currently pending on some way to do topk in Candle.
This will allow loading very large models onto the CPU and then applying ISQ onto the device.
Similar to what was described here https://github.com/huggingface/candle/issues/2108 "When prompts get longer than trivial sizes, the memory usage spikes as the prompt is thrown into one Tensor and sent off to...
Please let us know what model architectures you would like to be added! **Up to date todo list below. Please feel free to contribute any model, a PR without device...