automatic
automatic copied to clipboard
Add docker support
Description
Add support to run the UI using docker. Since the previous PRs (https://github.com/vladmandic/automatic/pull/403, https://github.com/vladmandic/automatic/pull/844) stalled, I merged their approaches and fixed remaining issues.
Notes
To improve security, the process is run using a non-root user inside the container. Since bind-mounts are owned by root inside the container, the entrypoint.sh script changes ownership to the non-root user to make it writable.
Environment and Testing
- Ubuntu 22.04
- Docker 24.0.2
- Nvidia Container Toolkit 1.13.1
thanks for picking this up!
tcmalloc is amazing, but i don't want to go down the path of me installing it. can you remove all mentions of it?
and yes, tcmalloc should make its way into faq, but that's besides the point
don't modify README.md - better create a Wiki page for Docker and then it can be as short or as long as you want
i can create a link on README.md that points to Wiki page
do we need default ./data at all?
i totally agree that --data-dir should be specified, but why default to ./data?
@vladmandic I have now removed tcmalloc and the changes to the README.md.
Regarding --data-dir, I thought that using a subdir of the workdir of the container would be a sane default. But using /data or something else as default also works.
just an idea - having data inside container is really against the concept of containers.
how about making --data-dir mandatory instead?
for example:
RUN [ -z "$--data-dir" ] && echo "Must specify data directory" && exit 1 || true
Good idea, I have made it mandatory now
looks good to me, but please tell me you've actually tested it? :)
I built it again from scratch and noticed an error ^^
The requirements.txt file was ignored due to the /*.txt entry in the ignore file. Now it works.
Can you guys talk about the security benefits/pros and cons of using this?
Can you guys talk about the security benefits/pros and cons of using this?
talk about benefits of using docker in general? not really, that's really outside of the scope of this pr, this is to provide simple-to-use template.
@vladmandic I think its ready to be merged
I merged master into this and have the following findings regarding docker compose up:
--skip-updateappears no longer valid and should be removed- On recreate, installation is again attempted, including downloading of all packages including
torch torchvision- probably thevenvor wherever those get put should be a volume also
I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.
Very unfortunate that the --skip-update flag was removed, thanks for bringing it to my attention @staff0rd. I think solving this indirectly by storing the packages and repositories in a bind-mounted directory is suboptimal, since they're not application state and should be stored within the container. @vladmandic is there a plan to bring --skip-update back or is there an equivalent feature?
@Kubuxu thanks for the suggestions, I've fixed the env vars.
I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.
Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?
Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?
python launch.py is fine (even better as webui.sh is not needed). I didn't notice that you didn't use webui.sh.
Correction, not RUN but CMD.
But in essence, having the default run command in CMD either as "python", "launch.py" or as webui "alias" which is handed by entrypoint.sh, which then allows one to override it.
So for example
ENTRYPOINT ["/bin/bash", "-c", "${INSTALLDIR}/entrypoint.sh \"$0\" \"$@\""] # same as today
CMD ["webui"]
Then the entrypoint.sh should detect webui at $1 and activate the env, and call python launch.py, otherwise it launches the command.
See postgress entrypoint as example:
#!/usr/bin/env bash
set -e
if [ "$1" = 'postgres' ]; then
chown -R postgres "$PGDATA"
if [ -z "$(ls -A "$PGDATA")" ]; then
gosu postgres initdb
fi
shift
exec gosu postgres "$@"
fi
exec "$@"
This will allow the user to both pass params to the launch.py like this: docker run image webui --api --backend diffusers and to run custom commands to test the image docker run --rm image nvidia-smi
@Kubuxu I think what you want to do here is already possible using the --entrypoint flag of docker run. So, for your example, you can do docker run --rm --entrypoint nvidia-smi image to override the entrypoint.
Yeah, this is another way of doing this. We can go down the --entrypoint path instead.
I had issues building this from within Ubuntu (20.04). I'm going to document my experience so that you can see the troubles I had along the way to hopefully help me fix them, but ultimately fix it for others who might use it once this has been accepted as a merge. Please don't take this as negative criticism at all, cos I really do appreciate all the hard work you guys are putting into this! I hope my experiences can help to get this accepted. I just wish I knew more to help move things along.
I kept getting the error:
$ docker-compose up
ERROR: The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
which I fixed by changing the docker-compose.yml file to not include name: sd-automatic since version 3.9 is defined on line 1 and name: is not supported. Please see: https://docs.docker.com/compose/compose-file/compose-file-v3/
You can also confirm it using the command docker-compose config which will tell you if the compose file is formatted correctly.
After I got past that error by removing the name variable, this was the error I got:
$ docker-compose up
Building nvidia
Sending build context to Docker daemon 38.6MB
Step 1/17 : ARG UBUNTU_VERSION=22.04 CUDA_VERSION=11.8.0 BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}
Step 2/17 : FROM ${BASE_CUDA_CONTAINER}
invalid reference format
ERROR: Service 'nvidia' failed to build : Build failed
For some reason BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION} isn't being evaluated properly. I had to fix this by hardcoding it into the file so the line was:
ARG UBUNTU_VERSION=22.04 \
CUDA_VERSION=11.8.0 \
BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04
And I changed it to:
ARG UBUNTU_VERSION=20.04 \
CUDA_VERSION=12.1.0 \
BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04
Although I changed it to 20.04 and 12.1.0 (which I confirmed by going to: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=12.1.0-cudnn8-runtime-ubuntu), I'm pretty sure changing it to:
ARG UBUNTU_VERSION=22.04 \
CUDA_VERSION=11.8.0 \
BASE_CUDA_CONTAINER=nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
Would work fine since that does exist too: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=11.8.0-cudnn8-runtime-ubuntu
The main issue seems to be with BASE_CUDA_CONTAINER not accepting the variables ${CUDA_VERSION} and ${UBUNTU_VERSION} even though in my mind that looks sane. I tried putting quotes in so that the full line was BASE_CUDA_CONTAINER="nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}" but that didn't work.
The next issue is the tzdata, it would be good to set a default during installation with an ENV so that you can set your own, since just doing docker-compose up without any commands forces you with this dialogue after installing all the apt packages:
Configuring tzdata
------------------
Please select the geographic area in which you live. Subsequent configuration
questions will narrow this down by presenting a list of cities, representing
the time zones in which they are located.
1. Africa 4. Australia 7. Atlantic 10. Pacific 13. Etc
2. America 5. Arctic 8. Europe 11. SystemV
3. Antarctica 6. Asia 9. Indian 12. US
Geographic area:
But when you type in 8 and hit enter, nothing happens. I had to stop the instance in portainer, and recreate it but with the -it flags so that I could interact with it in an attached tty window to the instance. That then allowed me to do the required continent, follow by the required city.
But once those were in, and it finished setting up. It just stopped running. Trying to re-run it, it obviously continues where it left off because all the packages are installed and tzdata is already set up, and then stops straight away. Trying to diagnose what the last message was and docker says there are no logs it can access for it.
Re-running it in the terminal again to make sure I didn't miss anything and I get:
$ docker run 5f270feee059
==========
== CUDA ==
==========
CUDA Version 12.1.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md
Oh! So must be the GPU permission, but still:
$ docker run 5f270feee059 --gpus=all
==========
== CUDA ==
==========
CUDA Version 12.1.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]
Slightly more information, tried it with the runtime=nvidia parameter as per the nvidia documentation for CUDA:
$ docker run 5f270feee059 --gpus all --runtime=nvidia
==========
== CUDA ==
==========
CUDA Version 12.1.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]
Hmmm... tried the nvidia test using the same base cuda I used for the installation of nvidia/cuda:12.1.0-base-ubuntu20.04:
$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.1.0-base-ubuntu20.04 nvidia-smi
[sudo] password for hazrpg:
Unable to find image 'nvidia/cuda:12.1.0-base-ubuntu20.04' locally
12.1.0-base-ubuntu20.04: Pulling from nvidia/cuda
56e0351b9876: Already exists
b0f696c0aebb: Pull complete
e627444df06f: Pull complete
dcf21018e934: Pull complete
a2855a2ef2e0: Pull complete
Digest: sha256:d0bf043a20ecc11940c5a452f67f239f9dec34a01d8f5583d2af93cf0da0f072
Status: Downloaded newer image for nvidia/cuda:12.1.0-base-ubuntu20.04
Sun Jul 30 02:40:24 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 On | N/A |
| 0% 49C P5 16W / 170W | 1572MiB / 12288MiB | 13% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
So everything is set up fine for docker, but the image still isn't working. Not sure where I am going wrong, but I feel like I'm close!
Note that I pulled this from the master branch on nopperl:master to test this out.
Edit: I realised after submitting that I hadn't tried the proper image for the nvidia test of nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04 so I changed it but still got the same result as above. I also realised that I had used sudo for the pre-build compose image (like I had for the nvidia test image) so re-ran sudo docker run 5f270feee059 --gpus all --runtime=nvidia to make sure the issue wasn't a permissions problem trying to access the hardware, but that still also gave me the same results as before. So not overly sure what's going on.
Didn't want to give up, so tried one more time - scrapped and purged everything, reset the repo back to how it was, and did docker-compose up again. The BASE_CUDA_CONTAINER was still an issue, so instead of setting it to my ubuntu version and the cuda I have installed, I just used the 22.04 and 11.8.0 from the original file, and changed BASE_CUDA_CONTAINER to be hardcoded to those versions instead (figure maybe that was why there was an issue).
This time I got a lot further! It installed correctly, run through everything. But this time in the terminal it looked like it had stopped doing anything after Available models: ./data/models/Stable-diffusion 0.
Started up another terminal and attached to the running image, and saw a different message saying Download the default model? (y/N) so I typed in y and hit enter. It started downloading the sd 1.5 model - perfect!
Then afterwards I got:
nvidia_1 | 03:39:52-863637 ERROR Module load: /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py: AttributeError
Followed by a long traceback log, but it looked like it was still going and did...
nvidia_1 | Image Browser: ImageReward is not installed, cannot be used.
nvidia_1 | 03:40:15-057529 INFO Loading UI theme: name=black-orange style=Auto
nvidia_1 | Image Browser: Creating database
nvidia_1 | Image Browser: Database created
nvidia_1 | 03:40:16-030004 ERROR Failed reading extension data from Git repository: a1111-sd-webui-lycoris: HEAD is a detached symbolic reference as it points to
nvidia_1 | 'b0d24ca645b6a5cb9752169691a1c6385c6fe6ae'
nvidia_1 | 03:40:16-036250 ERROR Failed reading extension data from Git repository: clip-interrogator-ext: HEAD is a detached symbolic reference as it points to
nvidia_1 | '9e6bbd9b8931bbe869a8e28e7005b0e13c2efff0'
nvidia_1 | 03:40:16-045836 ERROR Failed reading extension data from Git repository: multidiffusion-upscaler-for-automatic1111: HEAD is a detached symbolic reference as it
nvidia_1 | points to '70b3c5ea3c9f684d04e7ff59167565974415735c'
nvidia_1 | 03:40:16-053253 ERROR Failed reading extension data from Git repository: sd-dynamic-thresholding: HEAD is a detached symbolic reference as it points to
nvidia_1 | 'f02cacfc923e8bbf73f25327d722d50c458d66bb'
nvidia_1 | 03:40:16-066565 ERROR Failed reading extension data from Git repository: sd-extension-system-info: HEAD is a detached symbolic reference as it points to
nvidia_1 | '8046b1544513cea06d1c41748c22727c930323ab'
nvidia_1 | 03:40:16-075336 ERROR Failed reading extension data from Git repository: sd-webui-controlnet: HEAD is a detached symbolic reference as it points to
nvidia_1 | '7b707dc1f03c3070f8a506ff70a2b68173d57bb5'
nvidia_1 | 03:40:16-085855 ERROR Failed reading extension data from Git repository: sd-webui-model-converter: HEAD is a detached symbolic reference as it points to
nvidia_1 | 'f6e0fa5386fb82ef44feac74d66958af951fcc48'
nvidia_1 | 03:40:16-097230 ERROR Failed reading extension data from Git repository: stable-diffusion-webui-images-browser: HEAD is a detached symbolic reference as it
nvidia_1 | points to '75af6d0c32b72350b2f140f186cd8ce0e24dda10'
nvidia_1 | 03:40:16-111035 ERROR Failed reading extension data from Git repository: stable-diffusion-webui-rembg: HEAD is a detached symbolic reference as it points to
nvidia_1 | '657ae9f5486019a94dbe11d3560b28cccf35a0fd'
nvidia_1 | 03:40:16-147008 INFO Setting Torch parameters: dtype=torch.float16 vae=torch.float16 unet=torch.float16
Loading weights: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.3 GB -:--:--
nvidia_1 | LatentDiffusion: Running in eps-prediction mode
nvidia_1 | DiffusionWrapper has 859.52 M params.
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 2.82MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.84MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 2.08MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 5.89MB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████| 4.52k/4.52k [00:00<00:00, 23.9MB/s]
nvidia_1 | 03:40:19-248309 INFO Model created from config: /webui/configs/v1-inference.yaml
Calculating model hash: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 GB 0:00:00
nvidia_1 | 03:40:39-639737 INFO Applying scaled dot product cross attention optimization
nvidia_1 | 03:40:39-649293 INFO Embeddings loaded: 0 []
nvidia_1 | 03:40:39-661568 INFO Model loaded in 23.5s (load=0.6s create=2.5s hash=2.2s apply=17.4s vae=0.5s move=0.3s)
nvidia_1 | 03:40:40-197750 INFO Model load finished: {'ram': {'used': 9.04, 'total': 62.59}, 'gpu': {'used': 3.36, 'total': 11.75}, 'retries': 0, 'oom': 0}
nvidia_1 | Running on local URL: http://0.0.0.0:7860
nvidia_1 |
nvidia_1 | To create a public link, set `share=True` in `launch()`.
nvidia_1 | 03:40:40-532231 INFO Local URL: http://localhost:7860/
nvidia_1 | 03:40:40-533238 INFO API Docs: http://localhost:7860/docs
nvidia_1 | 03:40:40-533900 INFO Initializing middleware
nvidia_1 | ╭─────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
nvidia_1 | │ /webui/launch.py:149 in <module> │
nvidia_1 | │ │
nvidia_1 | │ 148 │
nvidia_1 | │ ❱ 149 instance = start_server(immediate=True, server=None) │
nvidia_1 | │ 150 while True: │
nvidia_1 | │ │
nvidia_1 | │ /webui/launch.py:129 in start_server │
nvidia_1 | │ │
nvidia_1 | │ 128 else: │
nvidia_1 | │ ❱ 129 server = server.webui() │
nvidia_1 | │ 130 if args.profile: │
nvidia_1 | │ │
nvidia_1 | │ /webui/webui.py:274 in webui │
nvidia_1 | │ │
nvidia_1 | │ 273 start_common() │
nvidia_1 | │ ❱ 274 start_ui() │
nvidia_1 | │ 275 load_model() │
nvidia_1 | │ │
nvidia_1 | │ /webui/webui.py:265 in start_ui │
nvidia_1 | │ │
nvidia_1 | │ 264 modules.progress.setup_progress_api(app) │
nvidia_1 | │ ❱ 265 create_api(app) │
nvidia_1 | │ 266 ui_extra_networks.add_pages_to_demo(app) │
nvidia_1 | │ │
nvidia_1 | │ /webui/webui.py:166 in create_api │
nvidia_1 | │ │
nvidia_1 | │ 165 log.debug('Creating API') │
nvidia_1 | │ ❱ 166 from modules.api.api import Api │
nvidia_1 | │ 167 api = Api(app, queue_lock) │
nvidia_1 | │ │
nvidia_1 | │ /webui/modules/api/api.py:17 in <module> │
nvidia_1 | │ │
nvidia_1 | │ 16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, images, scripts, │
nvidia_1 | │ ❱ 17 from modules.api.models import * # pylint: disable=unused-wildcard-import, wildcard-impo │
nvidia_1 | │ 18 from modules.processing import StableDiffusionProcessingTxt2Img, StableDiffusionProcessi │
nvidia_1 | │ │
nvidia_1 | │ /webui/modules/api/models.py:106 in <module> │
nvidia_1 | │ │
nvidia_1 | │ 105 ] │
nvidia_1 | │ ❱ 106 ).generate_model() │
nvidia_1 | │ 107 │
nvidia_1 | │ │
nvidia_1 | │ /webui/modules/api/models.py:91 in generate_model │
nvidia_1 | │ │
nvidia_1 | │ 90 DynamicModel = create_model(self._model_name, **model_fields) │
nvidia_1 | │ ❱ 91 DynamicModel.__config__.allow_population_by_field_name = True │
nvidia_1 | │ 92 DynamicModel.__config__.allow_mutation = True │
nvidia_1 | │ │
nvidia_1 | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:205 in __getattr__ │
nvidia_1 | │ │
nvidia_1 | │ 204 return getattr(self, '__pydantic_core_schema__') │
nvidia_1 | │ ❱ 205 raise AttributeError(item) │
nvidia_1 | │ 206 │
nvidia_1 | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
nvidia_1 | AttributeError: __config__
nvidia_1 | stable-diffusion-automatic-xl-docker_nvidia_1 exited with code 1
And that's when it exited.
Re-running docker-compose up or even just running the image directly, gives me all the same errors (except this time it isn't downloading anything, it looks like its just trying to use what it had).
So, still not working, but at least it was a derp moment on my part for putting in lower ubuntu version and a higher cuda version. There does appear to be an issue getting some of the needed dependencies, such as the extensions (although not fully required technically to get it working), and loading up the /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py script. And also running the middleware. The middleware being the thing that crashes it.
I think docker-compose has been deprecated in favour of "docker compose". IIRC that ought to solve the top-level name tag error.
@JohanAR Sure, you're not wrong that "docker compose" is the preferred method and "docker-compose" is deprecated and is a stub for legacy reasons to "docker compose" in the latest versions of docker.
However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x (ref: nvidia install guide. Which meant I had to downgrade to 20.10 a long while back to get anything CUDA working without some hacky workaround.
And the docker command does not support docker compose on version 20.10.x:
$ docker compose
docker: 'compose' is not a docker command.
See 'docker --help'
Which means, most people should be running on docker 20.10.x if they want to have the CUDA toolkit on Linux properly, or even in the cloud for that matter. And I believe those on Windows will likely experience similar issues since that recommends going through the WDL2 route.
There are workarounds to this obviously on the latest version of docker, which as far as I understand crashes on the latest-latest (which means you have to always be running a slightly older version of 23.x.x or 24.x.x) but that would mean this repo would need to support said workarounds or other people will post issue after issue that it isn't working for them.
I'm going through the process of upgrading back to the latest version - cos I would love to be proved wrong - and will report back my findings, but I suspect I will end up having to figure a bunch of workarounds to get it to work properly.
However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x
I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.
It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.
Running in to the same issue as @hazrpg, it fails when "Initializing middleware". I'm not sure what the Python code is doing, but it seems to be missing some configuration attributes, maybe?
Configuration
Ubuntu 22.04.2 LTS
Docker version 24.0.5
Docker Compose version v2.20.2
NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0
Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ? The reason I'm asking is that it's quite uncommon to have to attach to the running container to answer setup parameters. It works but it's not usual with Docker builds.
sd-automatic-nvidia-1 | Running on local URL: http://0.0.0.0:7860
sd-automatic-nvidia-1 |
sd-automatic-nvidia-1 | To create a public link, set `share=True` in `launch()`.
sd-automatic-nvidia-1 | 14:55:30-633627 INFO Local URL: http://localhost:7860/
sd-automatic-nvidia-1 | 14:55:30-637451 INFO API Docs: http://localhost:7860/docs
sd-automatic-nvidia-1 | 14:55:30-640605 INFO Initializing middleware
sd-automatic-nvidia-1 | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
sd-automatic-nvidia-1 | │ /webui/launch.py:149 in <module> │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 148 │
sd-automatic-nvidia-1 | │ ❱ 149 instance = start_server(immediate=True, server=None) │
sd-automatic-nvidia-1 | │ 150 while True: │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/launch.py:129 in start_server │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 128 else: │
sd-automatic-nvidia-1 | │ ❱ 129 server = server.webui() │
sd-automatic-nvidia-1 | │ 130 if args.profile: │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/webui.py:274 in webui │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 273 start_common() │
sd-automatic-nvidia-1 | │ ❱ 274 start_ui() │
sd-automatic-nvidia-1 | │ 275 load_model() │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/webui.py:265 in start_ui │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 264 modules.progress.setup_progress_api(app) │
sd-automatic-nvidia-1 | │ ❱ 265 create_api(app) │
sd-automatic-nvidia-1 | │ 266 ui_extra_networks.add_pages_to_demo(app) │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/webui.py:166 in create_api │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 165 log.debug('Creating API') │
sd-automatic-nvidia-1 | │ ❱ 166 from modules.api.api import Api │
sd-automatic-nvidia-1 | │ 167 api = Api(app, queue_lock) │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/modules/api/api.py:17 in <module> │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, │
sd-automatic-nvidia-1 | │ ❱ 17 from modules.api.models import * # pylint: disable=unused-wildcard-imp │
sd-automatic-nvidia-1 | │ 18 from modules.processing import StableDiffusionProcessingTxt2Img, Stabl │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/modules/api/models.py:106 in <module> │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 105 ] │
sd-automatic-nvidia-1 | │ ❱ 106 ).generate_model() │
sd-automatic-nvidia-1 | │ 107 │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/modules/api/models.py:91 in generate_model │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 90 DynamicModel = create_model(self._model_name, **model_fields) │
sd-automatic-nvidia-1 | │ ❱ 91 DynamicModel.__config__.allow_population_by_field_name = True │
sd-automatic-nvidia-1 | │ 92 DynamicModel.__config__.allow_mutation = True │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construct │
sd-automatic-nvidia-1 | │ ion.py:205 in __getattr__ │
sd-automatic-nvidia-1 | │ │
sd-automatic-nvidia-1 | │ 204 return getattr(self, '__pydantic_core_schema__ │
sd-automatic-nvidia-1 | │ ❱ 205 raise AttributeError(item) │
sd-automatic-nvidia-1 | │ 206 │
sd-automatic-nvidia-1 | ╰──────────────────────────────────────────────────────────────────────────────╯
sd-automatic-nvidia-1 | AttributeError: __config__
sd-automatic-nvidia-1 |
Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ?
@hleroy --no-download
Firstly, thanks to everyone for the great work put into vladmandic/automatic! I'm recording my experiences trying to use the Dockerfile with vast.ai in case it is useful for others. My apologies if the approach I took was not best practices or just plain wrong - I'm fairly new to docker so please take the following as the experiences of a naive end-user trying to get this to work on a GPU cloud provider.
My use case is that I have a Macbook Pro but I would like to build and use a docker image of vladmandic/automatic that can be used on a GPU cloud provider like vast.ai or runpod.io.
My config:
OS: MacOS Monterey 12.6
Docker engine: 24.0.2
Docker Compose: 2.19.1
Steps:
- clone nopperl/automatic to my MBP
- modify the Dockerfile FROM instruction:
FROM --platform=linux/amd64 ${BASE_CUDA_CONTAINER} - run
docker compose build -t alexeberts/stable-diffusion:sdnext-test-2 . - wait 30 mins
- run
docker push alexeberts/stable-diffusion:sdnext-test-2 - setup template on vast.ai using
alexeberts/stable-diffusion:sdnext-test-2 - create instance on vast.ai using the
ssh loginoption. - ssh into the instance and run
entrypoint.sh
Results:
- The container args
INSTALLDIRetc are not automatically added to the new environment - After setting up the args manually, and running
entrypoint.shthe server starts but with the same errors @hleroy and @hazrpg ran into. - I was not able to get a running instance of automatic.
- I considered trying to build the image using
docker compose buildto see if I was missing configuration info fromdocker-compose.ymlbut I could not figure out how to ensure thatdocker compose buildwould build a linux/amd64 container (addingplatform: linux/amd64to the docker-compose.yml resulted in an error).
I'm happy to continue testing on vast.ai if someone can provide a linux image or instructions for how to successfully build a linux image from this repo on a MBP.
However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x
I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source
https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.
I did eventually try the upstream docker apt packages, instead of the Canonical/Debian ones. Looks like although the Nvidia toolkit says it doesn't support newer versions, the lovely docker peeps must have gotten around that and made sure it still works. So I stand corrected, thank you for pointing it out.
However I'm still stuck at the middleware stage sadly even with the newer docker and using docker compose.
Why did the PR stall? Was there a technical difficulty?
moving status to draft until comments are incorporated and maintainer is found.
What is the status of this PR? Using SD.next with docker install would be a huge win IMHO.
there are plenty of users using sdnext inside a docker container, but having an official dockerfile is tricky as everyone has their own idea what docker config should be like and it also varies on platform.
On that note for anyone looking for a "one-click" docker deploy -- I have contributed to and am using grokuku/stable-diffusion on a linux host with nvidia gpu. It "just works" and stays up-to-date with master branch automatically. Read the readme ofc but an example run command:
docker run -d -p 9000:9000 -e "PUID=1000" -e "PGID=1000" -e "WEBUI_VERSION=04" -v /path/on/host/data:/config --runtime=nvidia --gpus all holaflenain/stable-diffusion