private-gpt docker file and compose

May 14 '23 13:05 JulienA

Why not build the image in docker-compose directly?

May 15 '23 08:05 thebigbone

Pushed both change

May 15 '23 11:05 JulienA

Thanks for sharing. Could you allow the use of a .env file to avoid modifying the repo's yaml file?

For instance, use:

 ${MODELS:-./models}

to set the models directory so that it can be set in the .env file.

May 15 '23 16:05 mdeweerd

Thanks for sharing. Could you allow the use of a .env file to avoid modifying the repo's yaml file?

For instance, use:
 ${MODELS:-./models}
to set the models directory so that it can be set in the .env file.

Not sure to understand correctly but setting load_dotenv(override=True) will override docker-compose env var with the .env file but there is not .env file actually

May 15 '23 16:05 JulienA

There is no .env file in the repo, but we can set one locally.

By setting .env as follows, I successfully used my E: drive for the models. A user that does not have a local .env should be using ./models instead.

MODELS=E:/

This avoids changing any git controlled file to adapt to the local setup. I already had some models on my e-drive... .

May 15 '23 17:05 mdeweerd

@mdeweerd reviewed in b4aad15

May 15 '23 17:05 JulienA

I was able to use "MODEL_MOUNT".

I suggest to convert the line endings to CRLF of these files.

As I was applying a local pre-commit configuration, this detected that the line endings of the yaml files (and Dockerfile) is CRLF - yamllint suggest to have LF line endings - yamlfix helps format the files automatically.

I am still struggling to get an anwser to my question - the container stops at some point. Maybe this has to do with memory - the container limit is 7.448GiB .

May 15 '23 18:05 mdeweerd

FYI, I've set the memory for WSL2 to 12GB which allowed me to get an anwser to a question.

My .wslconfig now looks like:

[wsl2]
memory=12GB

During compilation I noticed some references to nvidia, so I wondered if the image should be based on some cuda image.

I tried FROM wallies/python-cuda:3.10-cuda11.6-runtime but did not see an impact on performance - it may be helpful in the future.

May 15 '23 21:05 mdeweerd

The two docker-compose*.yaml files share elements and duplication could be avoided by adding both into a single docker-compose.yaml files, and using 'extend:'.

It also avoids having to specify the docker-compose*.yaml file.

You can have a look at https://github.com/mdeweerd/MetersToHA/blob/meters-to-ha/docker-compose.yml for some hints.

May 15 '23 23:05 mdeweerd

FYI, I tried to enable 'cuda' and got some kind of success: I got a cuda related error message:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown

In the Dockerfile I used:

FROM wallies/python-cuda:3.10-cuda11.7-runtime

and in the docker-compose-ingest.yaml file, I added:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

May 16 '23 09:05 mdeweerd

FYI, I tried to enable 'cuda' and got some kind of success: I got a cuda related error message:
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown
In the Dockerfile I used:
FROM wallies/python-cuda:3.10-cuda11.7-runtime
and in the docker-compose-ingest.yaml file, I added:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

I may be wrong but the requirements use the llamacpp so even if you use a cuda related stuff it won't be used ? since the cpp one only use CPU.

May 16 '23 09:05 JulienA

I may be wrong but the requirements use the llamacpp so even if you use a cuda related stuff it won't be used ? since the cpp one only use CPU.

When I run the app and use "docker stats", the cpu use exceeds 100%, so it's using more than 1 core (but only 1 cpu).

The program complains about the cuda version mismatch, so if it is not used then why would it complain?
I only got this error regarding cuda with ingest.
See: https://www.reddit.com/r/LocalLLaMA/comments/13gok03/llamacpp_now_officially_supports_gpu_acceleration/
See: https://github.com/ggerganov/llama.cpp#:~:text=acceleration%20using%20the-,CUDA,-cores%20of%20your

So the latest release has support for cuda.

May 16 '23 14:05 mdeweerd

I am making progress with CUDA and moved everything to a single docker-compose.yaml .

I proposed a PR for https://github.com/mdeweerd/privateGPT/tree/cuda in your fork.

May 17 '23 22:05 mdeweerd

Rebased to fix conflict

May 18 '23 06:05 JulienA

I had added the source_documents mount to the privateGPT service because I did not want to repeat it on every ingest service - I try to be DRY. I now remembered the name of the mechanism I was looking for: anchors and aliases.

Example, with volumes (the volumes are not reused individually, but I think they can be): https://gist.github.com/joebeeson/6efc5c0d7851b767d83947177ea17e0b
Some articles:
- https://medium.com/@kinghuang/docker-compose-anchors-aliases-extensions-a1e4105d70bd
- https://nickjanetakis.com/blog/docker-tip-82-using-yaml-anchors-and-x-properties-in-docker-compose

This is essentially a suggestion - maybe I'll look into it, but I have to attend some other stuff...

May 18 '23 11:05 mdeweerd

Since the source_document is only need at ingest, i try to avoid mounting it when not needed. Like this d4cfac2 you only have it in ingest and the cuda only override image, it's ok ?

May 18 '23 11:05 JulienA

Since the source_document is only need at ingest, i try to avoid mounting it when not needed. Like this d4cfac2 you only have it in ingest and the cuda only override image, it's ok ?

Yes, that's perfect.

May 18 '23 11:05 mdeweerd

You might want to consider reworking this as a cog.yml. Cog is a machine learning domain specific tool for creating and running containers: https://github.com/replicate/cog/

May 22 '23 21:05 macropin

Just dropping a comment here, this doesn't work out of the box on Apple M1 due to pypandoc-binary not resolving. See https://github.com/imartinez/privateGPT/issues/226.

Short term solution appears to be this: https://github.com/imartinez/privateGPT/issues/226#issuecomment-1553179978

May 27 '23 09:05 BaileyJM02

After change of permissions and running the ingest, I get a missing model file

$ chmod 777 models cache db
$ docker-compose run --rm privategpt-ingest
Creating privategpt_privategpt-ingest_run ... done
Loading documents from /home/privategpt/source_documents
Loading document: /home/privategpt/source_documents/state_of_the_union.txt
Loaded 1 documents from /home/privategpt/source_documents
Split into 90 chunks of text (max. 500 characters each)
Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db
$ docker-compose run --rm privategpt      
Creating privategpt_privategpt_run ... done
Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db
Traceback (most recent call last):
  File "/home/privategpt/src/privateGPT.py", line 57, in <module>
    main()
  File "/home/privategpt/src/privateGPT.py", line 30, in main
    llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/home/privategpt/.local/lib/python3.10/site-packages/langchain/llms/gpt4all.py", line 169, in validate_environment
    values["client"] = GPT4AllModel(
  File "/home/privategpt/.local/lib/python3.10/site-packages/pygpt4all/models/gpt4all_j.py", line 47, in __init__
    super(GPT4All_J, self).__init__(model_path=model_path,
  File "/home/privategpt/.local/lib/python3.10/site-packages/pygptj/model.py", line 58, in __init__     
    raise Exception(f"File {model_path} not found!")
Exception: File /home/privategpt/models/ggml-gpt4all-j-v1.3-groovy.bin not found!
ERROR: 1

May 31 '23 08:05 Rots

After change of permissions and running the ingest, I get a missing model file

$ chmod 777 models cache db
$ docker-compose run --rm privategpt-ingest
Creating privategpt_privategpt-ingest_run ... done
Loading documents from /home/privategpt/source_documents
Loading document: /home/privategpt/source_documents/state_of_the_union.txt
Loaded 1 documents from /home/privategpt/source_documents
Split into 90 chunks of text (max. 500 characters each)
Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db
$ docker-compose run --rm privategpt      
Creating privategpt_privategpt_run ... done
Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db
Traceback (most recent call last):
  File "/home/privategpt/src/privateGPT.py", line 57, in <module>
    main()
  File "/home/privategpt/src/privateGPT.py", line 30, in main
    llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
  File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
  File "pydantic/main.py", line 1102, in pydantic.main.validate_model
  File "/home/privategpt/.local/lib/python3.10/site-packages/langchain/llms/gpt4all.py", line 169, in validate_environment
    values["client"] = GPT4AllModel(
  File "/home/privategpt/.local/lib/python3.10/site-packages/pygpt4all/models/gpt4all_j.py", line 47, in __init__
    super(GPT4All_J, self).__init__(model_path=model_path,
  File "/home/privategpt/.local/lib/python3.10/site-packages/pygptj/model.py", line 58, in __init__     
    raise Exception(f"File {model_path} not found!")
Exception: File /home/privategpt/models/ggml-gpt4all-j-v1.3-groovy.bin not found!
ERROR: 1

the model is not download automatically.

you need to download it from https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin or wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin -O models/ggml-gpt4all-j-v1.3-groovy.bin

docker-compose.yml

---
version: '3.9'

x-ingest: &ingest
  environment:
    - COMMAND=python src/ingest.py  # Specify the command
...

services:
  privategpt:
...
    #command: [ python, src/privateGPT.py ]
    environment:
      - COMMAND=python src/privateGPT.py  # Specify the command
...

I changed some code to automatically check for the model Dockerfile:

#FROM python:3.10.11
#FROM wallies/python-cuda:3.10-cuda11.6-runtime

# Using argument for base image to avoid multiplying Dockerfiles
ARG BASEIMAGE
FROM $BASEIMAGE

# Copy the entrypoint script
COPY entrypoint.sh /entrypoint.sh

RUN groupadd -g 10009 -o privategpt && useradd -m -u 10009 -g 10009 -o -s /bin/bash privategpt \
    && chown privategpt:privategpt /entrypoint.sh && chmod +x /entrypoint.sh
USER privategpt
WORKDIR /home/privategpt

COPY ./src/requirements.txt src/requirements.txt
ARG LLAMA_CMAKE
#RUN CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install $(grep llama-cpp-python src/requirements.txt)

# Add the line to modify the PATH environment variable
ENV PATH="$PATH:/home/privategpt/.local/bin"

RUN pip install --upgrade pip \
    && ( /bin/bash -c "${LLAMA_CMAKE} pip install \$(grep llama-cpp-python src/requirements.txt)" 2>&1 | tee llama-build.log ) \
    && ( pip install --no-cache-dir -r src/requirements.txt 2>&1 | tee pip-install.log ) \
    && pip cache purge

COPY ./src src

# Set the entrypoint command
ENTRYPOINT ["/entrypoint.sh"]

entrypoint.sh:

#!/bin/bash

MODEL_FILE="models/ggml-gpt4all-j-v1.3-groovy.bin"
MODEL_URL="https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin"

# Check if the model file exists
if [ ! -f "$MODEL_FILE" ]; then
    echo "Model file not found. Downloading..."
    wget "$MODEL_URL" -O "$MODEL_FILE"
    echo "Model downloaded."
fi

# Check if the command is provided through environment variables
if [ -z "$COMMAND" ]; then
    # No command specified, fallback to default
    COMMAND=("python" "src/privateGPT.py")
else
    # Split the command string into an array
    IFS=' ' read -ra COMMAND <<< "$COMMAND"
fi

# Execute the command
"${COMMAND[@]}"

Jun 17 '23 07:06 denis-ev

LGTM

Jul 14 '23 14:07 k00ni

Came looking for an updated Dockerfile that doesn't have the old --chown on the COPY lines and found this PR. What's the thought on merging @denis-ev's approach?

Nov 03 '23 21:11 wmhartl

I wanted to chime in regarding a CUDA container for running PrivateGPT locally in docker on the NVIDIA Container Toolkit.

I combined elements from:

https://github.com/imartinez/privateGPT/issues/60#issuecomment-1678587331
ggerganov/llama.cpp/.devops/main-cuda.Dockerfile
imartinez/privateGPT/Dockerfile.local

An official NVIDIA CUDA image is used as base. The drawback of this is that ubuntu22.4 is the highest available version for the container and thus python3.11 has to be installed from an external repository. The CUDA version 11.8.0 was chosen as default since it is the newest version that does not require a driver version >=525.60.13 according to NVIDIA. The worker user was included since it is also present in the Dockerfile of @pabloogc which is currently in main.

The resulting image has a size of 8.5 GB. It expects two mounted volumes, one to /home/worker/app/local_data and one to /home/worker/app/models. Both should have uid 101 as owner. The name of the model file, which should be located directly in the mounted models folder, can be specified with the PGPT_HF_MODEL_FILE environment variable. The name of the Hugging Face repository of the embedding model, which should be cloned to a folder named embedding inside the models folder, can be specified with the PGPT_EMBEDDING_HF_MODEL_NAME environment variable.

At least this is what I think these two environment variables are used for after looking at imartinez/privateGPT/scripts/setup and imartinez/privateGPT/settings-docker.yaml. Specifying the model name with PGPT_HF_MODEL_FILE works, but although the repository of the embedding model is present in models/embedding, the embedding files seem to be downloaded again on first start.

This is the Dockerfile I came up with:

ARG UBUNTU_VERSION=22.04
ARG CUDA_VERSION=11.8.0
ARG CUDA_DOCKER_ARCH=all
ARG APP_DIR=/home/worker/app



### Build Image ###
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} as builder

ARG CUDA_DOCKER_ARCH
ARG APP_DIR

ENV DEBIAN_FRONTEND=noninteractive \
    CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH} \
    LLAMA_CUBLAS=1 \
    CMAKE_ARGS="-DLLAMA_CUBLAS=on" \
    FORCE_CMAKE=1 \
    POETRY_VIRTUALENVS_IN_PROJECT=true

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    apt-get update && \
    apt-get install -y --no-install-recommends software-properties-common && \
    add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
        python3.11 \
        python3.11-dev \
        python3.11-venv \
        build-essential \
        git && \
    python3.11 -m ensurepip && \
    python3.11 -m pip install pipx && \
    python3.11 -m pipx ensurepath && \
    pipx install poetry

ENV PATH="/root/.local/bin:$PATH"

WORKDIR $APP_DIR

RUN git clone https://github.com/imartinez/privateGPT.git . --depth 1

RUN poetry install --with local && \
    poetry install --with ui

RUN mkdir build_artifacts && \
    cp -r .venv private_gpt docs *.yaml *.md build_artifacts/



### Runtime Image ###
FROM nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION} as runtime

ARG APP_DIR

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONUNBUFFERED=1 \
    PGPT_PROFILES=docker,local

EXPOSE 8080

RUN adduser --system worker

WORKDIR $APP_DIR

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    apt-get update && \
    apt-get install -y --no-install-recommends software-properties-common && \
    add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
        python3.11 \
        python3.11-venv \
        curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    mkdir local_data models && \
    chown worker local_data models

COPY --chown=worker --from=builder $APP_DIR/build_artifacts ./

USER worker

HEALTHCHECK --start-period=1m --interval=5m --timeout=3s \
    CMD curl --head --silent --fail --show-error http://localhost:8080 || exit 1

ENTRYPOINT [".venv/bin/python", "-m", "private_gpt"]

Nov 21 '23 13:11 KPHIBYE

private-gpt private-gpt copied to clipboard

docker file and compose

private-gpt
private-gpt copied to clipboard