lmql Docker for serving local model using lmql serve-model

Hi,

Many thanks for the library. I'd like to run lmql serve-model in docker for using local model (i.e llama2) but running into issue at image building

Per documentation, we should build the image with https://github.com/eth-sri/lmql/blob/main/scripts/Dockerfile. However the entrypoint is lmql playground.

My understanding is we should use lmql serve-model when dealing with local models if we want reasonable inference speed. I have found this file https://github.com/eth-sri/lmql/blob/main/scripts/Dockerfile.serve that seems to fit this purpose but the build fails.

It's not yet documented if this approach is valid but any pointer would be appreciated

note - command ran from cloned repo, latest branch

docker build --build-arg GPU_ENABLED=true -f Dockerfile.serve -t lmql-docker-server:cuda11.8 .
[+] Building 2.2s (16/20)                                                      docker:default
 => [internal] load .dockerignore                                                        0.0s
 => => transferring context: 2B                                                          0.0s
 => [internal] load build definition from Dockerfile.serve                               0.0s
 => => transferring dockerfile: 1.93kB                                                   0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04          1.1s
 => [ 1/16] FROM docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04@sha256:7f34d0a2eeacd942  0.0s
 => [internal] load build context                                                        0.0s
 => => transferring context: 743B                                                        0.0s
 => CACHED [ 2/16] RUN apt-get update --yes --quiet && DEBIAN_FRONTEND=noninteractive a  0.0s
 => CACHED [ 3/16] RUN add-apt-repository --yes ppa:deadsnakes/ppa && apt-get update --  0.0s
 => CACHED [ 4/16] RUN DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --n  0.0s
 => CACHED [ 5/16] RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/  0.0s
 => CACHED [ 6/16] RUN pip install --upgrade pip                                         0.0s
 => CACHED [ 7/16] WORKDIR lmql                                                          0.0s
 => CACHED [ 8/16] RUN apt-get update                                                    0.0s
 => CACHED [ 9/16] RUN pip install "lmql[hf]"                                            0.0s
 => CACHED [10/16] COPY . /lmql                                                          0.0s
 => CACHED [11/16] WORKDIR /lmql                                                         0.0s
 => ERROR [12/16] RUN pip install -e ".[hf]"                                             1.1s
------                                                                                        
 > [12/16] RUN pip install -e ".[hf]":                                                        
0.885 Obtaining file:///lmql                                                                  
0.885 ERROR: file:///lmql does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
------
Dockerfile.serve:44
--------------------
  42 |     # re-install lmql from source 
  43 |     WORKDIR /lmql
  44 | >>> RUN pip install -e ".[hf]"
  45 |     
  46 |     VOLUME /transformers
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install -e \".[hf]\"" did not complete successfully: exit code: 1

Aug 25 '23 05:08 jpabbuehl

calling python instead also results in error, but later

python3 scripts/lmql-serve-docker.py
> docker image ls --format '{{.Tag}}' lmql-serve
> docker build -t lmql-serve -f scripts/Dockerfile.serve .
[+] Building 0.9s (17/20)                                                      docker:default
 => [internal] load build definition from Dockerfile.serve                               0.0s
 => => transferring dockerfile: 1.93kB                                                   0.0s
 => [internal] load .dockerignore                                                        0.0s
 => => transferring context: 2.92kB                                                      0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04          0.4s
 => [ 1/16] FROM docker.io/nvidia/cuda:11.8.0-devel-ubuntu22.04@sha256:7f34d0a2eeacd942  0.0s
 => [internal] load build context                                                        0.0s
 => => transferring context: 28.00kB                                                     0.0s
 => CACHED [ 2/16] RUN apt-get update --yes --quiet && DEBIAN_FRONTEND=noninteractive a  0.0s
 => CACHED [ 3/16] RUN add-apt-repository --yes ppa:deadsnakes/ppa && apt-get update --  0.0s
 => CACHED [ 4/16] RUN DEBIAN_FRONTEND=noninteractive apt-get install --yes --quiet --n  0.0s
 => CACHED [ 5/16] RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/  0.0s
 => CACHED [ 6/16] RUN pip install --upgrade pip                                         0.0s
 => CACHED [ 7/16] WORKDIR lmql                                                          0.0s
 => CACHED [ 8/16] RUN apt-get update                                                    0.0s
 => CACHED [ 9/16] RUN pip install "lmql[hf]"                                            0.0s
 => CACHED [10/16] COPY . /lmql                                                          0.0s
 => CACHED [11/16] WORKDIR /lmql                                                         0.0s
 => CACHED [12/16] RUN pip install -e ".[hf]"                                            0.0s
 => ERROR [13/16] RUN ls /transformers                                                   0.4s
------
 > [13/16] RUN ls /transformers:
0.314 ls: cannot access '/transformers': No such file or directory
------
Dockerfile.serve:47
--------------------
  45 |     
  46 |     VOLUME /transformers
  47 | >>> RUN ls /transformers
  48 |     
  49 |     ENV LMQL_VERSION="latest"
--------------------
ERROR: failed to solve: process "/bin/sh -c ls /transformers" did not complete successfully: exit code: 2
Traceback (most recent call last):
  File "/home/jp/dev/lmql/scripts/lmql-serve-docker.py", line 40, in <module>
    build_docker_image()
  File "/home/jp/dev/lmql/scripts/lmql-serve-docker.py", line 16, in build_docker_image
    subprocess.run(cmd, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['docker', 'build', '-t', 'lmql-serve', '-f', 'scripts/Dockerfile.serve', '.']' returned non-zero exit status 1.

Aug 25 '23 06:08 jpabbuehl

Hi there, I haven't gotten around to document that, sorry about that. There is actually also a script that makes running Dockerfile.serve behave just like a local lmql serve-model command (https://github.com/eth-sri/lmql/blob/main/scripts/lmql-serve-docker.py).

If you want to build/run the image yourself, you can also check the scripts for the environment variables it expects. Basically TRANSFORMERS_CACHE is a mounted volume, to avoid re-downloading cached models in each container instance.

Aug 25 '23 15:08 lbeurerkellner

lmql lmql copied to clipboard

Docker for serving local model using lmql serve-model

lmql
lmql copied to clipboard