llama-cpp-python Add Dockerfile + build workflow

Fixes #70

This PR adds a Dockerfile and updates the release workflow to build the latest Docker image too. Both amd64 and arm64 arches are built.

Apr 12 '23 09:04 Niek

@Niek do you mind moving this to the build release workflow?

Apr 12 '23 14:04 abetlen

@abetlen are you referring to build-and-release.yml? If we move the Docker step to this action, it can't use pip install though, it will have to download the artifacts and use that - not sure if this is what you intend.

Apr 12 '23 16:04 Niek

Maybe we should directly add openblas support? would need those two lines:

RUN apt update && apt install -y libopenblas-dev
RUN LLAMA_OPENBLAS=1 pip install llama-cpp-python[server]

Apr 15 '23 14:04 jmtatsch

Good idea @jmtatsch - added now

Apr 15 '23 18:04 Niek

Here is a docker file for a cublas capable container that should bring huge speed ups for cuda gpu owners after the next sync with upstream:

FROM nvidia/cuda:12.1.0-devel-ubuntu22.04

EXPOSE 8000
ENV MODEL=/models/ggml-vicuna-13b-1.1-q4_0.bin
# allow non local connections to api
ENV HOST=0.0.0.0

RUN apt update && apt install -y python3 python3-pip && LLAMA_CUBLAS=1 pip install llama-cpp-python[server]

ENTRYPOINT [ "python3", "-m", "llama_cpp.server" ]

Apr 21 '23 21:04 jmtatsch

Here is a docker file for a cublas capable container that should bring huge speed ups for cuda gpu owners after the next sync with upstream:

@jmtatsch where is requirements.txt coming from?

Apr 22 '23 08:04 gjmulder

@jmtatsch where is requirements.txt coming from?

good catch, it isn't necessary at all. I cleaned it up above. In 0.1.36 CUBLA is broken anyhow for me, waiting for https://github.com/ggerganov/llama.cpp/pull/1128

Apr 22 '23 21:04 jmtatsch

@abetlen do you need any other changes?

Apr 24 '23 07:04 Niek

@Niek if possible can we include @jmtatsch nvidia-docker container example as well in this PR? Ability to docker pull and run a GPU-accelerated container would be very helpful.

Apr 24 '23 17:04 abetlen

@abetlen We should make this two different containers then because the nvidia container with cublas is quite fat and not everyone has a Nvidia card. I will make a pull request once this one is merged. Sorry for hijacking your pull request @Niek

Apr 24 '23 18:04 jmtatsch

@Niek finally got a chance to merge this, great work! We now have a docker image.

@jmtatsch if you're still interested it would be awesome to get that cuBLAS-based image, happy to help there also.

May 02 '23 05:05 abetlen

llama-cpp-python llama-cpp-python copied to clipboard

Add Dockerfile + build workflow

llama-cpp-python
llama-cpp-python copied to clipboard