ComfyUI Dockerfile and automatic generation of images via GitHub Actions

TL;DR

Building on the great work of ZacharyACoon, I'm providing a Dockerfile that is a bit less complex and more aligned with common best practices, and makes builds more cachable. Additionally, this PR contains a GitHub Actions workflow that builds the image in five different PyTorch flavours (cu118, cu121, rocm5.6, rocm5.7-nightly, cpu), and pushes them to GitHub Container Repository (works out of the box) and optionally to Docker Hub (requires a username + token configured in action secrets).

You can see the action in action (no pun intended) in my fork: https://github.com/oxc/ComfyUI/actions/workflows/docker.yml

Dockerfile

The first commit adds the Dockerfile. Since you mentioned that you are not too familiar with those, and are unsure about maintaining it, I will go through the commands in the file and give some additional explanations. Hopefully that makes the decision easier.

The Dockerfile is used by docker to build the docker image. It basically is the recipe used to cook a final image. The image itself can then be uploaded to a remote registry, and from there everybody can pull it to their local registry to simply run it in a container.

The docker-compose.yml file can make it easier to run the docker image using docker compose, but is not strictly a requirement.

Expand here to see the commented Dockerfile

# syntax=docker/dockerfile:1.4

Specifying this syntax directive allows us to use certain features like RUN --mount=type=cache.

ARG BASE_IMAGE="python:3.11-slim-bookworm"

ARG at the top of the file defines build-args that can be set when building the docker image and used in FROM statements.

FROM ${BASE_IMAGE}

FROM specifies the base image that we are building our image on. This Dockerfile uses a python image based on Debian by default, selecting a python version that is supported by PyTorch. I personally have not seen making the base image configurable as a build-arg, but it does not really hurt, so I decided to keep it. It might be useful for people who are building their own images, and just want to use a different base image that provides a few more installed tools, for example.

ARG PYTORCH_INSTALL_ARGS=""
ARG EXTRA_ARGS=""
ARG USERNAME="comfyui"
ARG USER_UID=1000
ARG USER_GID=${USER_UID}

ARG after a FROM define that this build stage uses the mentioned args. These args also include default values, which can be overridden on the command line.

RUN \
	--mount=target=/var/lib/apt/lists,type=cache,sharing=locked \
	--mount=target=/var/cache/apt,type=cache,sharing=locked \
	set -eux; \
		apt-get update; \
		apt-get install -y --no-install-recommends \
			git \
			git-lfs

RUN set -eux; \
	groupadd --gid ${USER_GID} ${USERNAME}; \
	useradd --uid ${USER_UID} --gid ${USER_GID} -m ${USERNAME}

RUN executes the given shell command. These commands install git in the container, and create the user/group which will be running comfyui.

# run instructions as user
USER ${USER_UID}:${USER_GID}

USER defines the user/group as which to run the given commands.

WORKDIR /app

WORKDIR defines that all following commands will run in the context of the given directory in the container. You can think of it a bit like cd.

ENV PIP_CACHE_DIR="/cache/pip"
ENV VIRTUAL_ENV=/app/venv
ENV TRANSFORMERS_CACHE="/app/.cache/transformers"

ENV defines environment variables that we will be using later in the file.

# create cache directory
RUN mkdir -p ${TRANSFORMERS_CACHE}

# create virtual environment to manage packages
RUN python -m venv ${VIRTUAL_ENV}

Create the transformers cache directory that we will be setting later as env. Run python to create a virtual environment with the venv module.

# run python from venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"

Set the PATH so that pip and python will be run in the context of our venv.

COPY --chown=${USER_UID}:${USER_GID} requirements-${GPU_MAKE}.txt .
RUN --mount=type=cache,target=/cache/,uid=${USER_UID},gid=${USER_GID} \
	pip install torch torchvision torchaudio ${PYTORCH_INSTALL_ARGS}

Now install the PyTorch dependencies, passing the PYTORCH_INSTALL_ARGS so that users can select which PyTorch index/platform they want to use. This will be passed by the GitHub Action.

# copy requirements files first so packages can be cached separately
COPY --chown=${USER_UID}:${USER_GID} requirements.txt .
RUN --mount=type=cache,target=/cache/,uid=${USER_UID},gid=${USER_GID} \
	pip install -r requirements.txt

Now we COPY the requirements.txt to the container, and run pip install on it. We are using RUN --mount=type=cache to mount a cache directory for our pip cache that is managed by the docker runtime and shared between builds.

Why are we only copying the requirements file? Here you need to know that docker tries to cache all commands when building an image, but as soon as one command fails to use the cached version, all further commands have to be re-run also. For this reason, it helps to do the more stable stuff earlier in the image, and the more frequently changing things later. Since copying the program code of ComfyUI will change with every commit, copying it here would invalidate all caches for requirements etc. We need the requirements file however, so we copy only that. As long as it does not change, the result pip install can be cached. One can always run the build command with --no-cache to force a re-run.

You can read more about the cache here: https://docs.docker.com/build/cache/

COPY --chown=${USER_UID}:${USER_GID} . .

Now, as a last step, we copy the whole program code into the container image.

# default environment variables
ENV COMFYUI_ADDRESS=0.0.0.0
ENV COMFYUI_PORT=8188
ENV COMFYUI_EXTRA_BUILD_ARGS="${EXTRA_ARGS}"
ENV COMFYUI_EXTRA_ARGS=""
# default start command
CMD python -u main.py --listen ${COMFYUI_ADDRESS} --port ${COMFYUI_PORT} ${COMFYUI_EXTRA_BUILD_ARGS} ${COMFYUI_EXTRA_ARGS}

Finally, we define some more environment variables that can be overridden, and define the CMD that will be run when the container is launched.

GitHub Action

The GitHub action spawns several jobs in a matrix, one for each supported PyTorch flavor (cu118, cu121, rocm5.6, rocm5.7-nightly, cpu).

For each flavor, it builds the container image, and pushes it to the GitHub Container Registry. This works out of the box with the information provided by GitHub Actions. If a DOCKERHUB_USERNAME and a DOCKERHUB_TOKEN secret are configured on the GitHub repository, the action will also login and push to the Docker Hub. Here is the image the action on my fork created: https://hub.docker.com/r/obeliks/comfyui

The workflow uses several actions provided by Docker, one for generating image metadata from the gIthub context, one for logging in, and one for building and pushing. Ultimately, the workflow has been pieced together from the following examples (and adjusted accordingly):

https://docs.docker.com/build/ci/github-actions/manage-tags-labels/
https://docs.docker.com/build/ci/github-actions/push-multi-registries/
https://docs.docker.com/build/ci/github-actions/cache/
https://github.com/docker/metadata-action#latest-tag

I have opted not to add extra complexity to persist the pip cache in GitHub Actions cache. In most cases, the persisted layer cache will suffice. Take a look at this run that uses cached layers, compared to this one that creates most of them from scratch. The downloading of those large layers that contain pytorch wheels still takes some time, but is much faster then doing the full build.

Note that it still makes sense to have those pip caches in a cache mount, because it is useful for local building, and should prevent the cache from ending up in the image without having to erase it explicitly.

Sep 09 '23 21:09 oxc

AFAIK there is no auto-detection for CPU and currently the --cpu option has to be set explicitly.

Sep 18 '23 18:09 rsl8

AFAIK there is no auto-detection for CPU and currently the --cpu option has to be set explicitly.

You're right, if I try to start the cpu image, I get

AssertionError: Torch not compiled with CUDA enabled

I guess --cpu should be set by default in the cpu image.

Sep 18 '23 19:09 oxc

Alternatively, the documentation could mention that you have to run the cpu version with --cpu in COMFYUI_EXTRA_ARGS. That seems a bit silly, but is running ComfyUI on CPU only a thing anyway?

Sep 18 '23 19:09 oxc

The cpu image now passes the --cpu flag as arg.

I've introduced a separate COMFYUI_EXTRA_BUILD_ARGS env that gets initialized from the EXTRA_ARGS build arg, which gets set to --cpu in the GitHub Action matrix.

Sep 18 '23 20:09 oxc

For completeness' sake, I've added all image variants and the information about the EXTRA_ARGS=--cpu build arg to the README.

Sep 18 '23 20:09 oxc

I've slightly modified the Dockerfile to install git and git-lfs, so that ComfyUI Manager works out of the box. I've updated the collapsed documentation in the original post.

Sep 21 '23 22:09 oxc

Updates:

I've update the requirements files to the latest versions (and created a separate PR for it).

I've changed the base image to 3.11-slim-bookworm to make sure that Python 3.11 is used (not 3.12).

Oct 13 '23 11:10 oxc

Update: I've removed the requirement-*.txt files because I also wanted to build a cu118 version, and since xformers is gone (for now), it just seems easier and more flexible not to have to commit everything to a file.

It also might make this PR more acceptable to be merged.

I've updated the description in the first post accordingly.

Oct 20 '23 10:10 oxc

What is blocking this PR ?

Nov 05 '23 23:11 julien-blanchon

Hey, I'm using your docker and sometime with custom node I get this error

comfyui-comfyui-1  |     import cv2
comfyui-comfyui-1  | ImportError: libGL.so.1: cannot open shared object file: No such file or directory

Seems that apt-get update && apt-get install libgl1 in the Dockerfile solve this issue with minimal package overhead. Maybe you should include this in directly in the Dockerfile.

Nov 06 '23 16:11 julien-blanchon

Hey, I'm using your docker and sometime with custom node I get this error

Thanks for reaching out. Can you tell me which custom node this is? I'll try to figure out which packages are used by commonly used custom nodes...

Nov 06 '23 17:11 oxc

I don't know exactly which one but this occurs with https://github.com/ltdrdata/ComfyUI-Impact-Pack with mmdet_skip = True

Nov 06 '23 21:11 julien-blanchon

It looks like this might be solved by installing opencv-python-headless (with pip) instead of the GUI version. Could you try this? I'm wondering whether and how/where comfyui manager installs dependencies, and if that survives container restarts.

Nov 06 '23 21:11 oxc

I will not be able to try before tomorrow, I will let you know

Nov 06 '23 21:11 julien-blanchon

When will this be merged? :-)

Dec 18 '23 16:12 jvnte

You can now mount a folder to /app/custom_venv and the image will copy its venv there and launch from there. This will make installed custom_node dependencies persist across restarts.

Feb 16 '24 23:02 oxc

You can now mount a folder to /app/custom_venv and the image will copy its venv there and launch from there. This will make installed custom_node dependencies persist across restarts.

Can you elaborate more?

Apr 03 '24 15:04 rizlas

Yes. If you mount a folder to /app/custom_venv inside your container, the container image will copy all the python libraries with which it was compiled into that folder on startup, and then run from there. That allows addons like ComfyUI Manager to install custom python dependencies that will be preserved across restarts.

EDIT: The folder can/should be empty when you first mount it. It will then be filled by the container image and possibly the ComfyUI Manager.

Apr 03 '24 15:04 oxc

So for example, to install a custom node, I'll just put it inside /app/custom_venv?

Apr 03 '24 16:04 rizlas

No, custom_nodes need to be mounted separately to /app/custom_nodes. The custom_venv is for dependencies that need to be installed for your custom nodes. This should best be done from within the container.

Apr 03 '24 16:04 oxc

Thanks 👍🏻

Apr 03 '24 16:04 rizlas