mistral.rs WSL2 Docker error loading llama-3.1 gguf

WSL2 Docker error loading llama-3.1 gguf

Open underlines opened this issue 6 months ago • 2 comments

Describe the bug

My environment

Windows 11 Pro, Docker Desktop, WSL2 Ubuntu Engine, latest nvidia driver

Cuda test

I made sure the Docker WSL2 Cuda implementation works correctly by executing: docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark as stated in the documentation. So cuda works inside Docker with WSL2.

Model loading error

docker run --gpus all --rm -v C:\Users\xxx\.cache\lm-studio\models\duyntnet\Meta-Llama-3.1-8B-Instruct-imatrix-GGUF:/model -p 8080:8080 ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-8a84d05 gguf -m /model -f Meta-Llama-3.1-8B-Instruct-IQ4_NL.gguf

leads to

...
2024-08-12T20:56:20.241100Z  INFO mistralrs_core::pipeline::paths: Loading `Meta-Llama-3.1-8B-Instruct-IQ4_NL.gguf` locally at `/model/Meta-Llama-3.1-8B-Instruct-IQ4_NL.gguf`
2024-08-12T20:56:20.244485Z  INFO mistralrs_core::pipeline::gguf: Loading model `/model` on cuda[0].
Error: path: "/model/Meta-Llama-3.1-8B-Instruct-IQ4_NL.gguf" unknown dtype for tensor 20

maybe iMatrix Quants are not supported?

Trying a normal gguf quant also doesn't seem to work:

docker run --gpus all --rm -v C:\Users\xxx\.cache\lm-studio\models\bartowski\Meta-Llama-3.1-8B-Instruct-GGUF:/model -p 8080:8080 ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-8a84d05 gguf -m /model -f Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf

leading to:

...
2024-08-12T20:55:28.177396Z  INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `gpt2`, kind: `Bpe`, num tokens: 128256, num added tokens: 0, num merges: 280147, num scores: 0
2024-08-12T20:55:28.185104Z  INFO mistralrs_core::gguf::chat_template: Discovered and using GGUF chat template: `...

Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading dequantize_block_q8_0_f32

This is a newer quant after the rope freq issue was fixed in llama.cpp

Port argument not found

Also: I can use the docker argument -p 8080:1234 to map ports. The mistral.rs arguments for --serve-ip 0.0.0.0 works, the --port 1234 doesn't:

docker run --gpus all --rm -v C:\Users\Jan\.cache\lm-studio\models\bartowski\Meta-Llama-3.1-8B-Instruct-GGUF:/model -p 8080:1234 ghcr.io/ericlbuehler/mistral.rs:cuda-90-sha-8a84d05 --serve-ip 0.0.0.0 --port 1234 gguf -m /model -f Meta-Llama-3.1-8B-Instruct-Q6_K_L.gguf

leads to

error: the argument '--port <PORT>' cannot be used multiple times

Latest commit or version

Using Docker ericlbuehler/mistral.rs:cuda-90-sha-8a84d05

Aug 12 '24 21:08 underlines

mistral.rs mistral.rs copied to clipboard

WSL2 Docker error loading llama-3.1 gguf

Describe the bug

My environment

Cuda test

Model loading error

Port argument not found

Latest commit or version

mistral.rs
mistral.rs copied to clipboard