mistral.rs Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"

Describe the bug

Running a docker build seems to fail with the error failed to read /mistralrs/mistralrs-bench/Cargo.toml

[+] Building 2.0s (18/20)                                                                                                               docker:default
 => CACHED [mistralrs internal] load git source https://github.com/EricLBuehler/mistral.rs.git#master                                             0.7s
 => [mistralrs internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04                                                1.1s
 => [mistralrs internal] load metadata for docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04                                                  1.1s
 => [mistralrs base 1/2] FROM docker.io/nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04@sha256:fa44193567d1908f7ca1f3abf8623ce9c63bc8cba7bcfdb3270  0.0s
 => [mistralrs builder  1/13] FROM docker.io/nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04@sha256:fb1ad20f2552f5b3aafb2c9c478ed57da95e2bb027d15218  0.0s
 => CACHED [mistralrs base 2/2] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends     libomp-dev    0.0s
 => CACHED [mistralrs builder  2/13] RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends     curl     0.0s
 => CACHED [mistralrs builder  3/13] RUN curl https://sh.rustup.rs -sSf | bash -s -- -y                                                           0.0s
 => CACHED [mistralrs builder  4/13] RUN rustup update nightly                                                                                    0.0s
 => CACHED [mistralrs builder  5/13] RUN rustup default nightly                                                                                   0.0s
 => CACHED [mistralrs builder  6/13] WORKDIR /mistralrs                                                                                           0.0s
 => CACHED [mistralrs builder  7/13] COPY mistralrs mistralrs                                                                                     0.0s
 => CACHED [mistralrs builder  8/13] COPY mistralrs-core mistralrs-core                                                                           0.0s
 => CACHED [mistralrs builder  9/13] COPY mistralrs-lora mistralrs-lora                                                                           0.0s
 => CACHED [mistralrs builder 10/13] COPY mistralrs-pyo3 mistralrs-pyo3                                                                           0.0s
 => CACHED [mistralrs builder 11/13] COPY mistralrs-server mistralrs-server                                                                       0.0s
 => CACHED [mistralrs builder 12/13] COPY Cargo.toml ./                                                                                           0.0s
 => ERROR [mistralrs builder 13/13] RUN RUSTFLAGS="-Z threads=4" cargo build --release --workspace --exclude mistralrs-pyo3 --features "cuda cud  0.2s
------
 > [mistralrs builder 13/13] RUN RUSTFLAGS="-Z threads=4" cargo build --release --workspace --exclude mistralrs-pyo3 --features "cuda cudnn":
0.150 error: failed to load manifest for workspace member `/mistralrs/mistralrs-bench`
0.150 referenced by workspace at `/mistralrs/Cargo.toml`
0.150
0.150 Caused by:
0.150   failed to read `/mistralrs/mistralrs-bench/Cargo.toml`
0.150
0.150 Caused by:
0.150   No such file or directory (os error 2)
------
failed to solve: process "/bin/sh -c RUSTFLAGS=\"-Z threads=4\" cargo build --release --workspace --exclude mistralrs-pyo3 --features \"${FEATURES}\"" did not complete successfully: exit code: 101

services:
  &name mistralrs:
    <<: [*ai-common, *restart, *secopts, *gpu]
    build:
      context: https://github.com/EricLBuehler/mistral.rs.git#master
      dockerfile: Dockerfile-cuda-all
    container_name: *name
    hostname: *name
    profiles:
      - *name
    ports:
      - 80
    volumes:
      - /mnt/llm/mistralrs/data:/data

Latest commit Which commit you ran with.

4505a5e4f5e53d924d3caa2f9182639e8967a7bb

Apr 29 '24 03:04 sammcj

I'll look into it, mistralrs-core now also seams to depend on pyo3 so i also have to add python to the builder containers.

Apr 29 '24 08:04 LLukas22

@LLukas22, do you think you could open a PR to add this? We do depend on pyo3 now in mistralrs-core.

Apr 29 '24 10:04 EricLBuehler

Thanks, confirmed that fixed the builds.

Just a note that the default entrypoint for the container does not work though:

For more information, try '--help'.
error: 'mistralrs-server' requires a subcommand but one was not provided
  [subcommands: plain, x-lora, lora, gguf, x-lora-gguf, lora-gguf, ggml, x-lora-ggml, lora-ggml, help]

Usage: mistralrs-server [OPTIONS] <COMMAND>

Apr 29 '24 21:04 sammcj

@sammcj

Yeah, the default entry point currently only sets the port and hf_token. Since there are a lot of options to load a model into the server the containers expect a command to define what you actually want to host.

For phi-3 a compose file could look something like this:

services:
  text-generation:
    image: ghcr.io/llukas22/mistral.rs:cuda-89-sha-46a9df2
    ports:
        - 12005:80
    volumes:
        - /data/hf-cache:/data:z
    command: plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
    environment:
       - HUGGING_FACE_HUB_TOKEN=[YOUR TOKEN]
       - KEEP_ALIVE_INTERVAL=100
    healthcheck:
      test: curl --fail http://localhost/health || exit 1
      interval: 30s
      retries: 5
      start_period: 300s
      timeout: 10s
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: [gpu]
            count: all

We should probably improve the server/docker documentation 🤔

Apr 30 '24 06:04 LLukas22

Ah thanks, that worked straight away 😄,

curl https://mistralrs.internal/v1/chat/completions -H "Content-Type: application/json" -d '{
     "messages": [{"role": "user", "content": "tell me 10 jokes about llamas"}],
     "model": "microsoft/Phi-3-mini-128k-instruct",
     "temperature": 0.9
   }'
{"id":"3","choices":[{"finish_reason":"stop","index":0,"message":{"content":"1. Why don't llamas make good secret keepers? Because they spit the beans!\n\n2. What did one llama say to the other? You're a natural!\n\n3. Why don't llamas like going to parties? Because they always spill the hay!\n\n4. How do llamas take a break from work? They take a spit break!\n\n5. Why was the llama good at swimming? Because it could spit a splash!\n\n6. What kind of singer is a llama? A spit singer!\n\n7. Why don't llamas make good comedians? Because their jokes usually leave you spitting!\n\n8. How do llamas say goodbye? Not a problem, we'll meet on the other side! (spitting side!)\n\n9. Why don't llamas have very good hearing? Because they can't hear a fart from a llama 100 ft away!\n\n10. What did one llama say to the other, but they couldn't understand? Sorry, my spit ring was on!","role":"assistant"},"logprobs":null}],"created":1714459011,"model":"microsoft/Phi-3-mini-128k-instruct","system_fingerprint":"local","object":"chat.completion","usage":{"completion_tokens":250,"prompt_tokens":21,"total_tokens":271,"avg_tok_per_sec":82.446,"avg_prompt_tok_per_sec":1050.0,"avg_compl_tok_per_sec":76.522804,"total_time_sec":3.287,"total_prompt_time_sec":0.02,"total_completion_time_sec":3.267}}

"avg_tok_per_sec":82.446
1x RTX 3090

Apr 30 '24 06:04 sammcj

mistral.rs mistral.rs copied to clipboard

Docker builds fail with "failed to read `/mistralrs/mistralrs-bench/Cargo.toml`"

mistral.rs
mistral.rs copied to clipboard