mistral.rs
mistral.rs copied to clipboard
Blazingly fast LLM inference.
## Describe the bug Running Llama 3.2 on my MacBook Pro M3 Max (128GB) - with ``` cargo run --release --features metal -- --port 1234 vision-plain -m lamm-mit/Cephalo-L lama-3.2-11B-Vision-Instruct-128k -a...
## Describe the bug I got error directory: `/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs/examples` mymachine environment ``` ProductName: macOS ProductVersion: 14.4.1 Hardware Overview: Model Name: MacBook Pro Model Identifier: MacBookPro18,4 Model Number: Z15H0016ZJ/A Chip: Apple...
This occurs when using two GPUs, but it does not occur when I use just the one. I made sure to update to the docker image used in the dockerfile....
How to deploy mistralrs on Android for large model inference?
no pre-built `mistralrs-server` binaries under assets in [0.3.0 github release](https://github.com/EricLBuehler/mistral.rs/releases/tag/v0.3.0) very sad 😢
I use stop words/sequences to determine next steps after a response. So if the LLM returns stop word A, we perform action X. I also do the same with other...
Hello, llama.cpp recently added support for an AArch64 specific type of GGUF and AArch64 specific matmul kernels. Here is the merged PR https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660 Namely Q4_0_8_8, Q4_0_4_8 and more generic Q4_0_4_4...