mistral.rs issues

[Feature] Implementation of multi-gpu KV cache (RingAttention)

19

I'll work through adding it to quantized llama first, as I know that architecture the most. Link to the paper: https://arxiv.org/abs/2310.01889

joshpopelka20

new feature

WSL2 Docker error loading llama-3.1 gguf

2

## Describe the bug ### My environment Windows 11 Pro, Docker Desktop, WSL2 Ubuntu Engine, latest nvidia driver ### Cuda test I made sure the Docker WSL2 Cuda implementation works...

underlines

bug

cuda error not found

2

## use docker image [docker pull ghcr.io/ericlbuehler/mistral.rs:cuda-80-sha-9b898ee] run script in wsl2 ```docker run --gpus all -v E:/workspace/modelscope:/root/modelscope ghcr.io/ericlbuehler/mistral.rs:latest -i plain -m /root/modelscope/Qwen2-7B-Instruct -a qwen2``` got error Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol...

franklucky001

bug

build

Distributed inference and tensor parallelism plans

3

With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a...

EricLBuehler

new feature

backend

Tensor parallel support for multi GPU

10

Hello, I'm not sure if multi GPU is supported yet. I didn't find parameters for tensor parallel, and the "num_device_layers" parameter seems not work. Please let me know if it...

ilookee

new feature

Support for codestral mamba 2

Hi! The newest mamba2 model to coding codestral mamba2 has been released for several days. but I havent found any inference tool which can support that except the mistral-inference and...

pty819

new feature

Mistral instruction template not working correctly when loading from GGUF

6

## Describe the bug With `Mistral-7B-Instruct-v0.3-Q4_K_M.gguf` from https://huggingface.co/bartowski/Mistral-7B-Instruct-v0.3-GGUF I'm seeing this behavior: ``` $ mistralrs-server -i gguf -m . -f Mistral-7B-Instruct-v0.3-Q4_K_M.gguf 2024-08-03T06:46:30.204741Z INFO mistralrs_server: avx: true, neon: false, simd128: false,...

p-e-w

bug

triaged

support for gguf embedding models

1

Hi, really appreciate the work on this project (the llama-cpp-rs crates are not great). I would like to load a text embedding model such as: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF I am open to...

ericcarmi

new feature

Pre-built binary for linux fails to launch with "error while loading shared libraries: libssl.so.1.1"

10

## Describe the bug Pre-built binary for linux fails to launch with ``` error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No such file or directory ```...

lxe

bug

key must be a cuda tensor

4

## Describe the bug Compiled with: `cargo install --path mistralrs-server --features "cuda flash-attn cudnn mkl"` `RUST_BACKTRACE=full ./mistralrs_server --interactive-mode --num-device-layers 13 --pa-ctxt-len 8192 gguf -m [path] -f Athene-70B-Q8_0.gguf` ``` > Whooo...

oldgithubman

bug

mistral.rs
mistral.rs copied to clipboard

Metadata

[Feature] Implementation of multi-gpu KV cache (RingAttention)

WSL2 Docker error loading llama-3.1 gguf

cuda error not found

Distributed inference and tensor parallelism plans

Tensor parallel support for multi GPU

Support for codestral mamba 2

Mistral instruction template not working correctly when loading from GGUF

support for gguf embedding models

Pre-built binary for linux fails to launch with "error while loading shared libraries: libssl.so.1.1"

key must be a cuda tensor

← Metadata

Owner

Metadata

mistral.rs mistral.rs copied to clipboard

Metadata

← Metadata

Owner

Metadata

mistral.rs
mistral.rs copied to clipboard