private-gpt
private-gpt copied to clipboard
docker file and compose
Why not build the image in docker-compose directly?
Pushed both change
Thanks for sharing. Could you allow the use of a .env file to avoid modifying the repo's yaml file?
For instance, use:
${MODELS:-./models}
to set the models directory so that it can be set in the .env file.
Thanks for sharing. Could you allow the use of a .env file to avoid modifying the repo's yaml file?
For instance, use:
${MODELS:-./models}
to set the models directory so that it can be set in the .env file.
Not sure to understand correctly but setting load_dotenv(override=True) will override docker-compose env var with the .env file but there is not .env file actually
There is no .env file in the repo, but we can set one locally.
By setting .env as follows, I successfully used my E: drive for the models. A user that does not have a local .env should be using ./models instead.
MODELS=E:/
This avoids changing any git controlled file to adapt to the local setup. I already had some models on my e-drive... .
@mdeweerd reviewed in b4aad15
I was able to use "MODEL_MOUNT".
I suggest to convert the line endings to CRLF of these files.
As I was applying a local pre-commit configuration, this detected that the line endings of the yaml files (and Dockerfile) is CRLF - yamllint suggest to have LF line endings - yamlfix helps format the files automatically.
I am still struggling to get an anwser to my question - the container stops at some point. Maybe this has to do with memory - the container limit is 7.448GiB .
FYI, I've set the memory for WSL2 to 12GB which allowed me to get an anwser to a question.
My .wslconfig now looks like:
[wsl2]
memory=12GB
During compilation I noticed some references to nvidia, so I wondered if the image should be based on some cuda image.
I tried FROM wallies/python-cuda:3.10-cuda11.6-runtime
but did not see an impact on performance - it may be helpful in the future.
The two docker-compose*.yaml files share elements and duplication could be avoided by adding both into a single docker-compose.yaml files, and using 'extend:'.
It also avoids having to specify the docker-compose*.yaml file.
You can have a look at https://github.com/mdeweerd/MetersToHA/blob/meters-to-ha/docker-compose.yml for some hints.
FYI, I tried to enable 'cuda' and got some kind of success: I got a cuda related error message:
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown
In the Dockerfile I used:
FROM wallies/python-cuda:3.10-cuda11.7-runtime
and in the docker-compose-ingest.yaml file, I added:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
FYI, I tried to enable 'cuda' and got some kind of success: I got a cuda related error message:
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.7, please update your driver to a newer version, or use an earlier cuda container: unknown
In the Dockerfile I used:
FROM wallies/python-cuda:3.10-cuda11.7-runtime
and in the docker-compose-ingest.yaml file, I added:
deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]
I may be wrong but the requirements use the llamacpp so even if you use a cuda related stuff it won't be used ? since the cpp one only use CPU.
I may be wrong but the requirements use the llamacpp so even if you use a cuda related stuff it won't be used ? since the cpp one only use CPU.
When I run the app and use "docker stats", the cpu use exceeds 100%, so it's using more than 1 core (but only 1 cpu).
- The program complains about the cuda version mismatch, so if it is not used then why would it complain?
- I only got this error regarding cuda with ingest.
- See: https://www.reddit.com/r/LocalLLaMA/comments/13gok03/llamacpp_now_officially_supports_gpu_acceleration/
- See: https://github.com/ggerganov/llama.cpp#:~:text=acceleration%20using%20the-,CUDA,-cores%20of%20your
So the latest release has support for cuda.
I am making progress with CUDA and moved everything to a single docker-compose.yaml .
I proposed a PR for https://github.com/mdeweerd/privateGPT/tree/cuda in your fork.
Rebased to fix conflict
I had added the source_documents mount to the privateGPT service because I did not want to repeat it on every ingest service - I try to be DRY. I now remembered the name of the mechanism I was looking for: anchors and aliases.
- Example, with volumes (the volumes are not reused individually, but I think they can be): https://gist.github.com/joebeeson/6efc5c0d7851b767d83947177ea17e0b
- Some articles:
- https://medium.com/@kinghuang/docker-compose-anchors-aliases-extensions-a1e4105d70bd
- https://nickjanetakis.com/blog/docker-tip-82-using-yaml-anchors-and-x-properties-in-docker-compose
This is essentially a suggestion - maybe I'll look into it, but I have to attend some other stuff...
Since the source_document is only need at ingest, i try to avoid mounting it when not needed. Like this d4cfac2 you only have it in ingest and the cuda only override image, it's ok ?
Since the source_document is only need at ingest, i try to avoid mounting it when not needed. Like this d4cfac2 you only have it in ingest and the cuda only override image, it's ok ?
Yes, that's perfect.
You might want to consider reworking this as a cog.yml
. Cog is a machine learning domain specific tool for creating and running containers: https://github.com/replicate/cog/
Just dropping a comment here, this doesn't work out of the box on Apple M1 due to pypandoc-binary
not resolving. See https://github.com/imartinez/privateGPT/issues/226.
Short term solution appears to be this: https://github.com/imartinez/privateGPT/issues/226#issuecomment-1553179978
After change of permissions and running the ingest, I get a missing model file
$ chmod 777 models cache db
$ docker-compose run --rm privategpt-ingest
Creating privategpt_privategpt-ingest_run ... done
Loading documents from /home/privategpt/source_documents
Loading document: /home/privategpt/source_documents/state_of_the_union.txt
Loaded 1 documents from /home/privategpt/source_documents
Split into 90 chunks of text (max. 500 characters each)
Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db
$ docker-compose run --rm privategpt
Creating privategpt_privategpt_run ... done
Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db
Traceback (most recent call last):
File "/home/privategpt/src/privateGPT.py", line 57, in <module>
main()
File "/home/privategpt/src/privateGPT.py", line 30, in main
llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False)
File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__
File "pydantic/main.py", line 1102, in pydantic.main.validate_model
File "/home/privategpt/.local/lib/python3.10/site-packages/langchain/llms/gpt4all.py", line 169, in validate_environment
values["client"] = GPT4AllModel(
File "/home/privategpt/.local/lib/python3.10/site-packages/pygpt4all/models/gpt4all_j.py", line 47, in __init__
super(GPT4All_J, self).__init__(model_path=model_path,
File "/home/privategpt/.local/lib/python3.10/site-packages/pygptj/model.py", line 58, in __init__
raise Exception(f"File {model_path} not found!")
Exception: File /home/privategpt/models/ggml-gpt4all-j-v1.3-groovy.bin not found!
ERROR: 1
After change of permissions and running the ingest, I get a missing model file
$ chmod 777 models cache db $ docker-compose run --rm privategpt-ingest Creating privategpt_privategpt-ingest_run ... done Loading documents from /home/privategpt/source_documents Loading document: /home/privategpt/source_documents/state_of_the_union.txt Loaded 1 documents from /home/privategpt/source_documents Split into 90 chunks of text (max. 500 characters each) Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db $ docker-compose run --rm privategpt Creating privategpt_privategpt_run ... done Using embedded DuckDB with persistence: data will be stored in: /home/privategpt/db Traceback (most recent call last): File "/home/privategpt/src/privateGPT.py", line 57, in <module> main() File "/home/privategpt/src/privateGPT.py", line 30, in main llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False) File "pydantic/main.py", line 339, in pydantic.main.BaseModel.__init__ File "pydantic/main.py", line 1102, in pydantic.main.validate_model File "/home/privategpt/.local/lib/python3.10/site-packages/langchain/llms/gpt4all.py", line 169, in validate_environment values["client"] = GPT4AllModel( File "/home/privategpt/.local/lib/python3.10/site-packages/pygpt4all/models/gpt4all_j.py", line 47, in __init__ super(GPT4All_J, self).__init__(model_path=model_path, File "/home/privategpt/.local/lib/python3.10/site-packages/pygptj/model.py", line 58, in __init__ raise Exception(f"File {model_path} not found!") Exception: File /home/privategpt/models/ggml-gpt4all-j-v1.3-groovy.bin not found! ERROR: 1
the model is not download automatically.
you need to download it from
https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
or
wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin -O models/ggml-gpt4all-j-v1.3-groovy.bin
docker-compose.yml
---
version: '3.9'
x-ingest: &ingest
environment:
- COMMAND=python src/ingest.py # Specify the command
...
services:
privategpt:
...
#command: [ python, src/privateGPT.py ]
environment:
- COMMAND=python src/privateGPT.py # Specify the command
...
I changed some code to automatically check for the model Dockerfile:
#FROM python:3.10.11
#FROM wallies/python-cuda:3.10-cuda11.6-runtime
# Using argument for base image to avoid multiplying Dockerfiles
ARG BASEIMAGE
FROM $BASEIMAGE
# Copy the entrypoint script
COPY entrypoint.sh /entrypoint.sh
RUN groupadd -g 10009 -o privategpt && useradd -m -u 10009 -g 10009 -o -s /bin/bash privategpt \
&& chown privategpt:privategpt /entrypoint.sh && chmod +x /entrypoint.sh
USER privategpt
WORKDIR /home/privategpt
COPY ./src/requirements.txt src/requirements.txt
ARG LLAMA_CMAKE
#RUN CMAKE_ARGS="-DLLAMA_OPENBLAS=on" FORCE_CMAKE=1 pip install $(grep llama-cpp-python src/requirements.txt)
# Add the line to modify the PATH environment variable
ENV PATH="$PATH:/home/privategpt/.local/bin"
RUN pip install --upgrade pip \
&& ( /bin/bash -c "${LLAMA_CMAKE} pip install \$(grep llama-cpp-python src/requirements.txt)" 2>&1 | tee llama-build.log ) \
&& ( pip install --no-cache-dir -r src/requirements.txt 2>&1 | tee pip-install.log ) \
&& pip cache purge
COPY ./src src
# Set the entrypoint command
ENTRYPOINT ["/entrypoint.sh"]
entrypoint.sh:
#!/bin/bash
MODEL_FILE="models/ggml-gpt4all-j-v1.3-groovy.bin"
MODEL_URL="https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin"
# Check if the model file exists
if [ ! -f "$MODEL_FILE" ]; then
echo "Model file not found. Downloading..."
wget "$MODEL_URL" -O "$MODEL_FILE"
echo "Model downloaded."
fi
# Check if the command is provided through environment variables
if [ -z "$COMMAND" ]; then
# No command specified, fallback to default
COMMAND=("python" "src/privateGPT.py")
else
# Split the command string into an array
IFS=' ' read -ra COMMAND <<< "$COMMAND"
fi
# Execute the command
"${COMMAND[@]}"
LGTM
Came looking for an updated Dockerfile that doesn't have the old --chown on the COPY lines and found this PR. What's the thought on merging @denis-ev's approach?
I wanted to chime in regarding a CUDA container for running PrivateGPT locally in docker on the NVIDIA Container Toolkit.
I combined elements from:
- https://github.com/imartinez/privateGPT/issues/60#issuecomment-1678587331
- ggerganov/llama.cpp/.devops/main-cuda.Dockerfile
- imartinez/privateGPT/Dockerfile.local
An official NVIDIA CUDA image is used as base. The drawback of this is that ubuntu22.4
is the highest available version for the container and thus python3.11
has to be installed from an external repository. The CUDA version 11.8.0
was chosen as default since it is the newest version that does not require a driver version >=525.60.13
according to NVIDIA. The worker
user was included since it is also present in the Dockerfile of @pabloogc which is currently in main.
The resulting image has a size of 8.5 GB
. It expects two mounted volumes, one to /home/worker/app/local_data
and one to /home/worker/app/models
. Both should have uid 101
as owner. The name of the model file, which should be located directly in the mounted models
folder, can be specified with the PGPT_HF_MODEL_FILE
environment variable. The name of the Hugging Face repository of the embedding model, which should be cloned to a folder named embedding
inside the models
folder, can be specified with the PGPT_EMBEDDING_HF_MODEL_NAME
environment variable.
At least this is what I think these two environment variables are used for after looking at imartinez/privateGPT/scripts/setup and imartinez/privateGPT/settings-docker.yaml. Specifying the model name with PGPT_HF_MODEL_FILE
works, but although the repository of the embedding model is present in models/embedding
, the embedding files seem to be downloaded again on first start.
This is the Dockerfile I came up with:
ARG UBUNTU_VERSION=22.04
ARG CUDA_VERSION=11.8.0
ARG CUDA_DOCKER_ARCH=all
ARG APP_DIR=/home/worker/app
### Build Image ###
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} as builder
ARG CUDA_DOCKER_ARCH
ARG APP_DIR
ENV DEBIAN_FRONTEND=noninteractive \
CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH} \
LLAMA_CUBLAS=1 \
CMAKE_ARGS="-DLLAMA_CUBLAS=on" \
FORCE_CMAKE=1 \
POETRY_VIRTUALENVS_IN_PROJECT=true
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && \
apt-get install -y --no-install-recommends software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && \
apt-get install -y --no-install-recommends \
python3.11 \
python3.11-dev \
python3.11-venv \
build-essential \
git && \
python3.11 -m ensurepip && \
python3.11 -m pip install pipx && \
python3.11 -m pipx ensurepath && \
pipx install poetry
ENV PATH="/root/.local/bin:$PATH"
WORKDIR $APP_DIR
RUN git clone https://github.com/imartinez/privateGPT.git . --depth 1
RUN poetry install --with local && \
poetry install --with ui
RUN mkdir build_artifacts && \
cp -r .venv private_gpt docs *.yaml *.md build_artifacts/
### Runtime Image ###
FROM nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION} as runtime
ARG APP_DIR
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONUNBUFFERED=1 \
PGPT_PROFILES=docker,local
EXPOSE 8080
RUN adduser --system worker
WORKDIR $APP_DIR
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && \
apt-get install -y --no-install-recommends software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && \
apt-get install -y --no-install-recommends \
python3.11 \
python3.11-venv \
curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
mkdir local_data models && \
chown worker local_data models
COPY --chown=worker --from=builder $APP_DIR/build_artifacts ./
USER worker
HEALTHCHECK --start-period=1m --interval=5m --timeout=3s \
CMD curl --head --silent --fail --show-error http://localhost:8080 || exit 1
ENTRYPOINT [".venv/bin/python", "-m", "private_gpt"]