mistral.rs
mistral.rs copied to clipboard
Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"
Describe the bug
Running a docker build seems to fail with the error failed to read /mistralrs/mistralrs-bench/Cargo.toml
[+] Building 2.0s (18/20) docker:default
=> CACHED [mistralrs internal] load git source https://github.com/EricLBuehler/mistral.rs.git#master 0.7s
=> [mistralrs internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04 1.1s
=> [mistralrs internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04 1.1s
=> [mistralrs base 1/2] FROM docker.io/nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04@sha256:fa44193567d1908f7ca1f3abf8623ce9c63bc8cba7bcfdb3270 0.0s
=> [mistralrs builder 1/13] FROM docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04@sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218 0.0s
=> CACHED [mistralrs base 2/2] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends libomp-dev 0.0s
=> CACHED [mistralrs builder 2/13] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends curl 0.0s
=> CACHED [mistralrs builder 3/13] RUN curl https://sh.rustup.rs -sSf | bash -s -- -y 0.0s
=> CACHED [mistralrs builder 4/13] RUN rustup update nightly 0.0s
=> CACHED [mistralrs builder 5/13] RUN rustup default nightly 0.0s
=> CACHED [mistralrs builder 6/13] WORKDIR /mistralrs 0.0s
=> CACHED [mistralrs builder 7/13] COPY mistralrs mistralrs 0.0s
=> CACHED [mistralrs builder 8/13] COPY mistralrs-core mistralrs-core 0.0s
=> CACHED [mistralrs builder 9/13] COPY mistralrs-lora mistralrs-lora 0.0s
=> CACHED [mistralrs builder 10/13] COPY mistralrs-pyo3 mistralrs-pyo3 0.0s
=> CACHED [mistralrs builder 11/13] COPY mistralrs-server mistralrs-server 0.0s
=> CACHED [mistralrs builder 12/13] COPY Cargo.toml ./ 0.0s
=> ERROR [mistralrs builder 13/13] RUN RUSTFLAGS="-Z threads=4" cargo build --release --workspace --exclude mistralrs-pyo3 --features "cuda cud 0.2s
------
> [mistralrs builder 13/13] RUN RUSTFLAGS="-Z threads=4" cargo build --release --workspace --exclude mistralrs-pyo3 --features "cuda cudnn":
0.150 error: failed to load manifest for workspace member `/mistralrs/mistralrs-bench`
0.150 referenced by workspace at `/mistralrs/Cargo.toml`
0.150
0.150 Caused by:
0.150 failed to read `/mistralrs/mistralrs-bench/Cargo.toml`
0.150
0.150 Caused by:
0.150 No such file or directory (os error 2)
------
failed to solve: process "/bin/sh -c RUSTFLAGS=\"-Z threads=4\" cargo build --release --workspace --exclude mistralrs-pyo3 --features \"${FEATURES}\"" did not complete successfully: exit code: 101
services:
&name mistralrs:
<<: [*ai-common, *restart, *secopts, *gpu]
build:
context: https://github.com/EricLBuehler/mistral.rs.git#master
dockerfile: Dockerfile-cuda-all
container_name: *name
hostname: *name
profiles:
- *name
ports:
- 80
volumes:
- /mnt/llm/mistralrs/data:/data
Latest commit Which commit you ran with.
4505a5e4f5e53d924d3caa2f9182639e8967a7bb
I'll look into it, mistralrs-core
now also seams to depend on pyo3 so i also have to add python to the builder containers.
@LLukas22, do you think you could open a PR to add this? We do depend on pyo3 now in mistralrs-core
.
Thanks, confirmed that fixed the builds.
Just a note that the default entrypoint for the container does not work though:
For more information, try '--help'.
error: 'mistralrs-server' requires a subcommand but one was not provided
[subcommands: plain, x-lora, lora, gguf, x-lora-gguf, lora-gguf, ggml, x-lora-ggml, lora-ggml, help]
Usage: mistralrs-server [OPTIONS] <COMMAND>
@sammcj
Yeah, the default entry point currently only sets the port and hf_token. Since there are a lot of options to load a model into the server the containers expect a command to define what you actually want to host.
For phi-3 a compose file could look something like this:
services:
text-generation:
image: ghcr.io/llukas22/mistral.rs:cuda-89-sha-46a9df2
ports:
- 12005:80
volumes:
- /data/hf-cache:/data:z
command: plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
environment:
- HUGGING_FACE_HUB_TOKEN=[YOUR TOKEN]
- KEEP_ALIVE_INTERVAL=100
healthcheck:
test: curl --fail http://localhost/health || exit 1
interval: 30s
retries: 5
start_period: 300s
timeout: 10s
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
We should probably improve the server/docker documentation 🤔
Ah thanks, that worked straight away 😄,
curl https://mistralrs.internal/v1/chat/completions -H "Content-Type: application/json" -d '{
"messages": [{"role": "user", "content": "tell me 10 jokes about llamas"}],
"model": "microsoft/Phi-3-mini-128k-instruct",
"temperature": 0.9
}'
{"id":"3","choices":[{"finish_reason":"stop","index":0,"message":{"content":"1. Why don't llamas make good secret keepers? Because they spit the beans!\n\n2. What did one llama say to the other? You're a natural!\n\n3. Why don't llamas like going to parties? Because they always spill the hay!\n\n4. How do llamas take a break from work? They take a spit break!\n\n5. Why was the llama good at swimming? Because it could spit a splash!\n\n6. What kind of singer is a llama? A spit singer!\n\n7. Why don't llamas make good comedians? Because their jokes usually leave you spitting!\n\n8. How do llamas say goodbye? Not a problem, we'll meet on the other side! (spitting side!)\n\n9. Why don't llamas have very good hearing? Because they can't hear a fart from a llama 100 ft away!\n\n10. What did one llama say to the other, but they couldn't understand? Sorry, my spit ring was on!","role":"assistant"},"logprobs":null}],"created":1714459011,"model":"microsoft/Phi-3-mini-128k-instruct","system_fingerprint":"local","object":"chat.completion","usage":{"completion_tokens":250,"prompt_tokens":21,"total_tokens":271,"avg_tok_per_sec":82.446,"avg_prompt_tok_per_sec":1050.0,"avg_compl_tok_per_sec":76.522804,"total_time_sec":3.287,"total_prompt_time_sec":0.02,"total_completion_time_sec":3.267}}
- "avg_tok_per_sec":82.446
- 1x RTX 3090