mistral.rs
mistral.rs copied to clipboard
Blazingly fast LLM inference.
I'll work through adding it to quantized llama first, as I know that architecture the most. Link to the paper: https://arxiv.org/abs/2310.01889
## Describe the bug ### My environment Windows 11 Pro, Docker Desktop, WSL2 Ubuntu Engine, latest nvidia driver ### Cuda test I made sure the Docker WSL2 Cuda implementation works...
## use docker image [docker pull ghcr.io/ericlbuehler/mistral.rs:cuda-80-sha-9b898ee] run script in wsl2 ```docker run --gpus all -v E:/workspace/modelscope:/root/modelscope ghcr.io/ericlbuehler/mistral.rs:latest -i plain -m /root/modelscope/Qwen2-7B-Instruct -a qwen2``` got error Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol...
With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a...
Hello, I'm not sure if multi GPU is supported yet. I didn't find parameters for tensor parallel, and the "num_device_layers" parameter seems not work. Please let me know if it...
Hi! The newest mamba2 model to coding codestral mamba2 has been released for several days. but I havent found any inference tool which can support that except the mistral-inference and...
## Describe the bug With `Mistral-7B-Instruct-v0.3-Q4_K_M.gguf` from https://huggingface.co/bartowski/Mistral-7B-Instruct-v0.3-GGUF I'm seeing this behavior: ``` $ mistralrs-server -i gguf -m . -f Mistral-7B-Instruct-v0.3-Q4_K_M.gguf 2024-08-03T06:46:30.204741Z INFO mistralrs_server: avx: true, neon: false, simd128: false,...
Hi, really appreciate the work on this project (the llama-cpp-rs crates are not great). I would like to load a text embedding model such as: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF I am open to...
## Describe the bug Pre-built binary for linux fails to launch with ``` error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ```...
## Describe the bug Compiled with: `cargo install --path mistralrs-server --features "cuda flash-attn cudnn mkl"` `RUST_BACKTRACE=full ./mistralrs_server --interactive-mode --num-device-layers 13 --pa-ctxt-len 8192 gguf -m [path] -f Athene-70B-Q8_0.gguf` ``` > Whooo...