lmql
lmql copied to clipboard
Docker for serving local model using lmql serve-model
Hi,
Many thanks for the library. I'd like to run lmql serve-model in docker for using local model (i.e llama2) but running into issue at image building
Per documentation, we should build the image with https://github.com/eth-sri/lmql/blob/main/scripts/Dockerfile. However the entrypoint is lmql playground.
My understanding is we should use lmql serve-model when dealing with local models if we want reasonable inference speed. I have found this file https://github.com/eth-sri/lmql/blob/main/scripts/Dockerfile.serve that seems to fit this purpose but the build fails.
It's not yet documented if this approach is valid but any pointer would be appreciated
note - command ran from cloned repo, latest branch
docker build --build-arg GPU_ENABLED=true -f Dockerfile.serve -t lmql-docker-server:cuda11.8 .
[+] Building 2.2s (16/20) docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile.serve 0.0s
=> => transferring dockerfile: 1.93kB 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 1.1s
=> [ 1/16] FROM docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04@sha256:7f34d0a2eeacd942 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 743B 0.0s
=> CACHED [ 2/16] RUN apt-get update --yes --quiet && DEBIAN_FRONTEND=noninteractive a 0.0s
=> CACHED [ 3/16] RUN add-apt-repository --yes ppa:deadsnakes/ppa && apt-get update -- 0.0s
=> CACHED [ 4/16] RUN DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --n 0.0s
=> CACHED [ 5/16] RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/ 0.0s
=> CACHED [ 6/16] RUN pip install --upgrade pip 0.0s
=> CACHED [ 7/16] WORKDIR lmql 0.0s
=> CACHED [ 8/16] RUN apt-get update 0.0s
=> CACHED [ 9/16] RUN pip install "lmql[hf]" 0.0s
=> CACHED [10/16] COPY . /lmql 0.0s
=> CACHED [11/16] WORKDIR /lmql 0.0s
=> ERROR [12/16] RUN pip install -e ".[hf]" 1.1s
------
> [12/16] RUN pip install -e ".[hf]":
0.885 Obtaining file:///lmql
0.885 ERROR: file:///lmql does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
------
Dockerfile.serve:44
--------------------
42 | # re-install lmql from source
43 | WORKDIR /lmql
44 | >>> RUN pip install -e ".[hf]"
45 |
46 | VOLUME /transformers
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install -e \".[hf]\"" did not complete successfully: exit code: 1
calling python instead also results in error, but later
python3 scripts/lmql-serve-docker.py
> docker image ls --format '{{.Tag}}' lmql-serve
> docker build -t lmql-serve -f scripts/Dockerfile.serve .
[+] Building 0.9s (17/20) docker:default
=> [internal] load build definition from Dockerfile.serve 0.0s
=> => transferring dockerfile: 1.93kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2.92kB 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04 0.4s
=> [ 1/16] FROM docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04@sha256:7f34d0a2eeacd942 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 28.00kB 0.0s
=> CACHED [ 2/16] RUN apt-get update --yes --quiet && DEBIAN_FRONTEND=noninteractive a 0.0s
=> CACHED [ 3/16] RUN add-apt-repository --yes ppa:deadsnakes/ppa && apt-get update -- 0.0s
=> CACHED [ 4/16] RUN DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --n 0.0s
=> CACHED [ 5/16] RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/ 0.0s
=> CACHED [ 6/16] RUN pip install --upgrade pip 0.0s
=> CACHED [ 7/16] WORKDIR lmql 0.0s
=> CACHED [ 8/16] RUN apt-get update 0.0s
=> CACHED [ 9/16] RUN pip install "lmql[hf]" 0.0s
=> CACHED [10/16] COPY . /lmql 0.0s
=> CACHED [11/16] WORKDIR /lmql 0.0s
=> CACHED [12/16] RUN pip install -e ".[hf]" 0.0s
=> ERROR [13/16] RUN ls /transformers 0.4s
------
> [13/16] RUN ls /transformers:
0.314 ls: cannot access '/transformers': No such file or directory
------
Dockerfile.serve:47
--------------------
45 |
46 | VOLUME /transformers
47 | >>> RUN ls /transformers
48 |
49 | ENV LMQL_VERSION="latest"
--------------------
ERROR: failed to solve: process "/bin/sh -c ls /transformers" did not complete successfully: exit code: 2
Traceback (most recent call last):
File "/home/jp/dev/lmql/scripts/lmql-serve-docker.py", line 40, in <module>
build_docker_image()
File "/home/jp/dev/lmql/scripts/lmql-serve-docker.py", line 16, in build_docker_image
subprocess.run(cmd, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['docker', 'build', '-t', 'lmql-serve', '-f', 'scripts/Dockerfile.serve', '.']' returned non-zero exit status 1.
Hi there, I haven't gotten around to document that, sorry about that. There is actually also a script that makes running Dockerfile.serve behave just like a local lmql serve-model command (https://github.com/eth-sri/lmql/blob/main/scripts/lmql-serve-docker.py).
If you want to build/run the image yourself, you can also check the scripts for the environment variables it expects. Basically TRANSFORMERS_CACHE is a mounted volume, to avoid re-downloading cached models in each container instance.