automatic Add docker support

Description

Add support to run the UI using docker. Since the previous PRs (https://github.com/vladmandic/automatic/pull/403, https://github.com/vladmandic/automatic/pull/844) stalled, I merged their approaches and fixed remaining issues.

Notes

To improve security, the process is run using a non-root user inside the container. Since bind-mounts are owned by root inside the container, the entrypoint.sh script changes ownership to the non-root user to make it writable.

Environment and Testing

Ubuntu 22.04
Docker 24.0.2
Nvidia Container Toolkit 1.13.1

May 31 '23 17:05 nopperl

thanks for picking this up!

tcmalloc is amazing, but i don't want to go down the path of me installing it. can you remove all mentions of it?
and yes, tcmalloc should make its way into faq, but that's besides the point

don't modify README.md - better create a Wiki page for Docker and then it can be as short or as long as you want
i can create a link on README.md that points to Wiki page

do we need default ./data at all? i totally agree that --data-dir should be specified, but why default to ./data?

Jun 01 '23 20:06 vladmandic

@vladmandic I have now removed tcmalloc and the changes to the README.md.

Regarding --data-dir, I thought that using a subdir of the workdir of the container would be a sane default. But using /data or something else as default also works.

Jun 02 '23 14:06 nopperl

just an idea - having data inside container is really against the concept of containers. how about making --data-dir mandatory instead?

for example:

RUN [ -z "$--data-dir" ] && echo "Must specify data directory" && exit 1 || true

Jun 02 '23 15:06 vladmandic

Good idea, I have made it mandatory now

Jun 02 '23 16:06 nopperl

looks good to me, but please tell me you've actually tested it? :)

Jun 02 '23 16:06 vladmandic

I built it again from scratch and noticed an error ^^ The requirements.txt file was ignored due to the /*.txt entry in the ignore file. Now it works.

Jun 02 '23 17:06 nopperl

Can you guys talk about the security benefits/pros and cons of using this?

Jun 02 '23 19:06 FullBleed

Can you guys talk about the security benefits/pros and cons of using this?

talk about benefits of using docker in general? not really, that's really outside of the scope of this pr, this is to provide simple-to-use template.

Jun 02 '23 22:06 vladmandic

@vladmandic I think its ready to be merged

Jun 20 '23 18:06 nopperl

I merged master into this and have the following findings regarding docker compose up:

--skip-update appears no longer valid and should be removed
On recreate, installation is again attempted, including downloading of all packages including torch torchvision - probably the venv or wherever those get put should be a volume also

Jul 05 '23 15:07 staff0rd

I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.

Jul 06 '23 10:07 Kubuxu

Very unfortunate that the --skip-update flag was removed, thanks for bringing it to my attention @staff0rd. I think solving this indirectly by storing the packages and repositories in a bind-mounted directory is suboptimal, since they're not application state and should be stored within the container. @vladmandic is there a plan to bring --skip-update back or is there an equivalent feature?

Jul 06 '23 20:07 nopperl

@Kubuxu thanks for the suggestions, I've fixed the env vars.

I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.

Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?

Jul 06 '23 20:07 nopperl

Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?

python launch.py is fine (even better as webui.sh is not needed). I didn't notice that you didn't use webui.sh.

Correction, not RUN but CMD.

But in essence, having the default run command in CMD either as "python", "launch.py" or as webui "alias" which is handed by entrypoint.sh, which then allows one to override it.

So for example

ENTRYPOINT ["/bin/bash", "-c", "${INSTALLDIR}/entrypoint.sh \"$0\" \"$@\""] # same as today
CMD ["webui"]

Then the entrypoint.sh should detect webui at $1 and activate the env, and call python launch.py, otherwise it launches the command. See postgress entrypoint as example:

#!/usr/bin/env bash
set -e

if [ "$1" = 'postgres' ]; then
    chown -R postgres "$PGDATA"

    if [ -z "$(ls -A "$PGDATA")" ]; then
        gosu postgres initdb
    fi
    shift
    exec gosu postgres "$@"
fi

exec "$@"

This will allow the user to both pass params to the launch.py like this: docker run image webui --api --backend diffusers and to run custom commands to test the image docker run --rm image nvidia-smi

Jul 08 '23 22:07 Kubuxu

@Kubuxu I think what you want to do here is already possible using the --entrypoint flag of docker run. So, for your example, you can do docker run --rm --entrypoint nvidia-smi image to override the entrypoint.

Jul 10 '23 14:07 nopperl

Yeah, this is another way of doing this. We can go down the --entrypoint path instead.

Jul 10 '23 14:07 Kubuxu

I had issues building this from within Ubuntu (20.04). I'm going to document my experience so that you can see the troubles I had along the way to hopefully help me fix them, but ultimately fix it for others who might use it once this has been accepted as a merge. Please don't take this as negative criticism at all, cos I really do appreciate all the hard work you guys are putting into this! I hope my experiences can help to get this accepted. I just wish I knew more to help move things along.

I kept getting the error:

$ docker-compose up
ERROR: The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'

You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.

which I fixed by changing the docker-compose.yml file to not include name: sd-automatic since version 3.9 is defined on line 1 and name: is not supported. Please see: https://docs.docker.com/compose/compose-file/compose-file-v3/

You can also confirm it using the command docker-compose config which will tell you if the compose file is formatted correctly.

After I got past that error by removing the name variable, this was the error I got:

$ docker-compose up      
Building nvidia
Sending build context to Docker daemon   38.6MB
Step 1/17 : ARG UBUNTU_VERSION=22.04     CUDA_VERSION=11.8.0     BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}
Step 2/17 : FROM ${BASE_CUDA_CONTAINER}
invalid reference format
ERROR: Service 'nvidia' failed to build : Build failed

For some reason BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION} isn't being evaluated properly. I had to fix this by hardcoding it into the file so the line was:

ARG UBUNTU_VERSION=22.04 \
    CUDA_VERSION=11.8.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04

And I changed it to:

ARG UBUNTU_VERSION=20.04 \
    CUDA_VERSION=12.1.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04

Although I changed it to 20.04 and 12.1.0 (which I confirmed by going to: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=12.1.0-cudnn8-runtime-ubuntu), I'm pretty sure changing it to:

ARG UBUNTU_VERSION=22.04 \
    CUDA_VERSION=11.8.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

Would work fine since that does exist too: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=11.8.0-cudnn8-runtime-ubuntu

The main issue seems to be with BASE_CUDA_CONTAINER not accepting the variables ${CUDA_VERSION} and ${UBUNTU_VERSION} even though in my mind that looks sane. I tried putting quotes in so that the full line was BASE_CUDA_CONTAINER="nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}" but that didn't work.

The next issue is the tzdata, it would be good to set a default during installation with an ENV so that you can set your own, since just doing docker-compose up without any commands forces you with this dialogue after installing all the apt packages:

Configuring tzdata
------------------

Please select the geographic area in which you live. Subsequent configuration
questions will narrow this down by presenting a list of cities, representing
the time zones in which they are located.

  1. Africa      4. Australia  7. Atlantic  10. Pacific  13. Etc
  2. America     5. Arctic     8. Europe    11. SystemV
  3. Antarctica  6. Asia       9. Indian    12. US
Geographic area:

But when you type in 8 and hit enter, nothing happens. I had to stop the instance in portainer, and recreate it but with the -it flags so that I could interact with it in an attached tty window to the instance. That then allowed me to do the required continent, follow by the required city.

But once those were in, and it finished setting up. It just stopped running. Trying to re-run it, it obviously continues where it left off because all the packages are installed and tzdata is already set up, and then stops straight away. Trying to diagnose what the last message was and docker says there are no logs it can access for it.

Re-running it in the terminal again to make sure I didn't miss anything and I get:

$ docker run 5f270feee059 

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

Oh! So must be the GPU permission, but still:

$ docker run 5f270feee059 --gpus=all

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Slightly more information, tried it with the runtime=nvidia parameter as per the nvidia documentation for CUDA:

$ docker run 5f270feee059 --gpus all --runtime=nvidia

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Hmmm... tried the nvidia test using the same base cuda I used for the installation of nvidia/cuda:12.1.0-base-ubuntu20.04:

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.1.0-base-ubuntu20.04 nvidia-smi
[sudo] password for hazrpg: 
Unable to find image 'nvidia/cuda:12.1.0-base-ubuntu20.04' locally
12.1.0-base-ubuntu20.04: Pulling from nvidia/cuda
56e0351b9876: Already exists 
b0f696c0aebb: Pull complete 
e627444df06f: Pull complete 
dcf21018e934: Pull complete 
a2855a2ef2e0: Pull complete 
Digest: sha256:d0bf043a20ecc11940c5a452f67f239f9dec34a01d8f5583d2af93cf0da0f072
Status: Downloaded newer image for nvidia/cuda:12.1.0-base-ubuntu20.04
Sun Jul 30 02:40:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P5              16W / 170W |   1572MiB / 12288MiB |     13%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

So everything is set up fine for docker, but the image still isn't working. Not sure where I am going wrong, but I feel like I'm close!

Note that I pulled this from the master branch on nopperl:master to test this out.

Edit: I realised after submitting that I hadn't tried the proper image for the nvidia test of nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04 so I changed it but still got the same result as above. I also realised that I had used sudo for the pre-build compose image (like I had for the nvidia test image) so re-ran sudo docker run 5f270feee059 --gpus all --runtime=nvidia to make sure the issue wasn't a permissions problem trying to access the hardware, but that still also gave me the same results as before. So not overly sure what's going on.

Jul 30 '23 02:07 hazrpg

Didn't want to give up, so tried one more time - scrapped and purged everything, reset the repo back to how it was, and did docker-compose up again. The BASE_CUDA_CONTAINER was still an issue, so instead of setting it to my ubuntu version and the cuda I have installed, I just used the 22.04 and 11.8.0 from the original file, and changed BASE_CUDA_CONTAINER to be hardcoded to those versions instead (figure maybe that was why there was an issue).

This time I got a lot further! It installed correctly, run through everything. But this time in the terminal it looked like it had stopped doing anything after Available models: ./data/models/Stable-diffusion 0.

Started up another terminal and attached to the running image, and saw a different message saying Download the default model? (y/N) so I typed in y and hit enter. It started downloading the sd 1.5 model - perfect!

Then afterwards I got:

nvidia_1  | 03:39:52-863637 ERROR    Module load: /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py: AttributeError

Followed by a long traceback log, but it looked like it was still going and did...

nvidia_1  | Image Browser: ImageReward is not installed, cannot be used.
nvidia_1  | 03:40:15-057529 INFO     Loading UI theme: name=black-orange style=Auto                                                                                            
nvidia_1  | Image Browser: Creating database
nvidia_1  | Image Browser: Database created
nvidia_1  | 03:40:16-030004 ERROR    Failed reading extension data from Git repository: a1111-sd-webui-lycoris: HEAD is a detached symbolic reference as it points to          
nvidia_1  |                          'b0d24ca645b6a5cb9752169691a1c6385c6fe6ae'                                                                                                
nvidia_1  | 03:40:16-036250 ERROR    Failed reading extension data from Git repository: clip-interrogator-ext: HEAD is a detached symbolic reference as it points to           
nvidia_1  |                          '9e6bbd9b8931bbe869a8e28e7005b0e13c2efff0'                                                                                                
nvidia_1  | 03:40:16-045836 ERROR    Failed reading extension data from Git repository: multidiffusion-upscaler-for-automatic1111: HEAD is a detached symbolic reference as it 
nvidia_1  |                          points to '70b3c5ea3c9f684d04e7ff59167565974415735c'                                                                                      
nvidia_1  | 03:40:16-053253 ERROR    Failed reading extension data from Git repository: sd-dynamic-thresholding: HEAD is a detached symbolic reference as it points to         
nvidia_1  |                          'f02cacfc923e8bbf73f25327d722d50c458d66bb'                                                                                                
nvidia_1  | 03:40:16-066565 ERROR    Failed reading extension data from Git repository: sd-extension-system-info: HEAD is a detached symbolic reference as it points to        
nvidia_1  |                          '8046b1544513cea06d1c41748c22727c930323ab'                                                                                                
nvidia_1  | 03:40:16-075336 ERROR    Failed reading extension data from Git repository: sd-webui-controlnet: HEAD is a detached symbolic reference as it points to             
nvidia_1  |                          '7b707dc1f03c3070f8a506ff70a2b68173d57bb5'                                                                                                
nvidia_1  | 03:40:16-085855 ERROR    Failed reading extension data from Git repository: sd-webui-model-converter: HEAD is a detached symbolic reference as it points to        
nvidia_1  |                          'f6e0fa5386fb82ef44feac74d66958af951fcc48'                                                                                                
nvidia_1  | 03:40:16-097230 ERROR    Failed reading extension data from Git repository: stable-diffusion-webui-images-browser: HEAD is a detached symbolic reference as it     
nvidia_1  |                          points to '75af6d0c32b72350b2f140f186cd8ce0e24dda10'                                                                                      
nvidia_1  | 03:40:16-111035 ERROR    Failed reading extension data from Git repository: stable-diffusion-webui-rembg: HEAD is a detached symbolic reference as it points to    
nvidia_1  |                          '657ae9f5486019a94dbe11d3560b28cccf35a0fd'                                                                                                
nvidia_1  | 03:40:16-147008 INFO     Setting Torch parameters: dtype=torch.float16 vae=torch.float16 unet=torch.float16                                                        
Loading weights: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.3 GB -:--:--
nvidia_1  | LatentDiffusion: Running in eps-prediction mode
nvidia_1  | DiffusionWrapper has 859.52 M params.
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 2.82MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.84MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 2.08MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 5.89MB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████| 4.52k/4.52k [00:00<00:00, 23.9MB/s]
nvidia_1  | 03:40:19-248309 INFO     Model created from config: /webui/configs/v1-inference.yaml                                                                               
Calculating model hash: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 GB 0:00:00
nvidia_1  | 03:40:39-639737 INFO     Applying scaled dot product cross attention optimization                                                                                  
nvidia_1  | 03:40:39-649293 INFO     Embeddings loaded: 0 []                                                                                                                   
nvidia_1  | 03:40:39-661568 INFO     Model loaded in 23.5s (load=0.6s create=2.5s hash=2.2s apply=17.4s vae=0.5s move=0.3s)                                                    
nvidia_1  | 03:40:40-197750 INFO     Model load finished: {'ram': {'used': 9.04, 'total': 62.59}, 'gpu': {'used': 3.36, 'total': 11.75}, 'retries': 0, 'oom': 0}               
nvidia_1  | Running on local URL:  http://0.0.0.0:7860
nvidia_1  | 
nvidia_1  | To create a public link, set `share=True` in `launch()`.
nvidia_1  | 03:40:40-532231 INFO     Local URL: http://localhost:7860/                                                                                                         
nvidia_1  | 03:40:40-533238 INFO     API Docs: http://localhost:7860/docs                                                                                                      
nvidia_1  | 03:40:40-533900 INFO     Initializing middleware                                                                                                                   
nvidia_1  | ╭─────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
nvidia_1  | │ /webui/launch.py:149 in <module>                                                                                                                                │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   148                                                                                                                                                           │
nvidia_1  | │ ❱ 149     instance = start_server(immediate=True, server=None)                                                                                                  │
nvidia_1  | │   150     while True:                                                                                                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/launch.py:129 in start_server                                                                                                                            │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   128         else:                                                                                                                                             │
nvidia_1  | │ ❱ 129             server = server.webui()                                                                                                                       │
nvidia_1  | │   130     if args.profile:                                                                                                                                      │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:274 in webui                                                                                                                                    │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   273     start_common()                                                                                                                                        │
nvidia_1  | │ ❱ 274     start_ui()                                                                                                                                            │
nvidia_1  | │   275     load_model()                                                                                                                                          │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:265 in start_ui                                                                                                                                 │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   264     modules.progress.setup_progress_api(app)                                                                                                              │
nvidia_1  | │ ❱ 265     create_api(app)                                                                                                                                       │
nvidia_1  | │   266     ui_extra_networks.add_pages_to_demo(app)                                                                                                              │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:166 in create_api                                                                                                                               │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   165     log.debug('Creating API')                                                                                                                             │
nvidia_1  | │ ❱ 166     from modules.api.api import Api                                                                                                                       │
nvidia_1  | │   167     api = Api(app, queue_lock)                                                                                                                            │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/api.py:17 in <module>                                                                                                                        │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │    16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, images, scripts,                                                                   │
nvidia_1  | │ ❱  17 from modules.api.models import * # pylint: disable=unused-wildcard-import, wildcard-impo                                                                  │
nvidia_1  | │    18 from modules.processing import StableDiffusionProcessingTxt2Img, StableDiffusionProcessi                                                                  │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/models.py:106 in <module>                                                                                                                    │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   105     ]                                                                                                                                                     │
nvidia_1  | │ ❱ 106 ).generate_model()                                                                                                                                        │
nvidia_1  | │   107                                                                                                                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/models.py:91 in generate_model                                                                                                               │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │    90         DynamicModel = create_model(self._model_name, **model_fields)                                                                                     │
nvidia_1  | │ ❱  91         DynamicModel.__config__.allow_population_by_field_name = True                                                                                     │
nvidia_1  | │    92         DynamicModel.__config__.allow_mutation = True                                                                                                     │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:205 in __getattr__                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   204                         return getattr(self, '__pydantic_core_schema__')                                                                                  │
nvidia_1  | │ ❱ 205             raise AttributeError(item)                                                                                                                    │
nvidia_1  | │   206                                                                                                                                                           │
nvidia_1  | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
nvidia_1  | AttributeError: __config__
nvidia_1  | stable-diffusion-automatic-xl-docker_nvidia_1 exited with code 1

And that's when it exited.

Re-running docker-compose up or even just running the image directly, gives me all the same errors (except this time it isn't downloading anything, it looks like its just trying to use what it had).

So, still not working, but at least it was a derp moment on my part for putting in lower ubuntu version and a higher cuda version. There does appear to be an issue getting some of the needed dependencies, such as the extensions (although not fully required technically to get it working), and loading up the /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py script. And also running the middleware. The middleware being the thing that crashes it.

Jul 30 '23 03:07 hazrpg

I think docker-compose has been deprecated in favour of "docker compose". IIRC that ought to solve the top-level name tag error.

Jul 30 '23 08:07 JohanAR

@JohanAR Sure, you're not wrong that "docker compose" is the preferred method and "docker-compose" is deprecated and is a stub for legacy reasons to "docker compose" in the latest versions of docker.

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x (ref: nvidia install guide. Which meant I had to downgrade to 20.10 a long while back to get anything CUDA working without some hacky workaround.

And the docker command does not support docker compose on version 20.10.x:

$ docker compose
docker: 'compose' is not a docker command.
See 'docker --help'

Which means, most people should be running on docker 20.10.x if they want to have the CUDA toolkit on Linux properly, or even in the cloud for that matter. And I believe those on Windows will likely experience similar issues since that recommends going through the WDL2 route.

There are workarounds to this obviously on the latest version of docker, which as far as I understand crashes on the latest-latest (which means you have to always be running a slightly older version of 23.x.x or 24.x.x) but that would mean this repo would need to support said workarounds or other people will post issue after issue that it isn't working for them.

I'm going through the process of upgrading back to the latest version - cos I would love to be proved wrong - and will report back my findings, but I suspect I will end up having to figure a bunch of workarounds to get it to work properly.

Jul 31 '23 09:07 hazrpg

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x

I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.

It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.

Jul 31 '23 09:07 djmaze

Running in to the same issue as @hazrpg, it fails when "Initializing middleware". I'm not sure what the Python code is doing, but it seems to be missing some configuration attributes, maybe?

Configuration

Ubuntu 22.04.2 LTS
Docker version 24.0.5
Docker Compose version v2.20.2
NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0

Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ? The reason I'm asking is that it's quite uncommon to have to attach to the running container to answer setup parameters. It works but it's not usual with Docker builds.

sd-automatic-nvidia-1  | Running on local URL:  http://0.0.0.0:7860
sd-automatic-nvidia-1  | 
sd-automatic-nvidia-1  | To create a public link, set `share=True` in `launch()`.
sd-automatic-nvidia-1  | 14:55:30-633627 INFO     Local URL: http://localhost:7860/                      
sd-automatic-nvidia-1  | 14:55:30-637451 INFO     API Docs: http://localhost:7860/docs                   
sd-automatic-nvidia-1  | 14:55:30-640605 INFO     Initializing middleware                                
sd-automatic-nvidia-1  | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
sd-automatic-nvidia-1  | │ /webui/launch.py:149 in <module>                                             │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   148                                                                        │
sd-automatic-nvidia-1  | │ ❱ 149     instance = start_server(immediate=True, server=None)               │
sd-automatic-nvidia-1  | │   150     while True:                                                        │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/launch.py:129 in start_server                                         │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   128         else:                                                          │
sd-automatic-nvidia-1  | │ ❱ 129             server = server.webui()                                    │
sd-automatic-nvidia-1  | │   130     if args.profile:                                                   │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:274 in webui                                                 │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   273     start_common()                                                     │
sd-automatic-nvidia-1  | │ ❱ 274     start_ui()                                                         │
sd-automatic-nvidia-1  | │   275     load_model()                                                       │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:265 in start_ui                                              │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   264     modules.progress.setup_progress_api(app)                           │
sd-automatic-nvidia-1  | │ ❱ 265     create_api(app)                                                    │
sd-automatic-nvidia-1  | │   266     ui_extra_networks.add_pages_to_demo(app)                           │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:166 in create_api                                            │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   165     log.debug('Creating API')                                          │
sd-automatic-nvidia-1  | │ ❱ 166     from modules.api.api import Api                                    │
sd-automatic-nvidia-1  | │   167     api = Api(app, queue_lock)                                         │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/api.py:17 in <module>                                     │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │    16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, │
sd-automatic-nvidia-1  | │ ❱  17 from modules.api.models import * # pylint: disable=unused-wildcard-imp │
sd-automatic-nvidia-1  | │    18 from modules.processing import StableDiffusionProcessingTxt2Img, Stabl │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/models.py:106 in <module>                                 │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   105     ]                                                                  │
sd-automatic-nvidia-1  | │ ❱ 106 ).generate_model()                                                     │
sd-automatic-nvidia-1  | │   107                                                                        │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/models.py:91 in generate_model                            │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │    90         DynamicModel = create_model(self._model_name, **model_fields)  │
sd-automatic-nvidia-1  | │ ❱  91         DynamicModel.__config__.allow_population_by_field_name = True  │
sd-automatic-nvidia-1  | │    92         DynamicModel.__config__.allow_mutation = True                  │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construct │
sd-automatic-nvidia-1  | │ ion.py:205 in __getattr__                                                    │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   204                         return getattr(self, '__pydantic_core_schema__ │
sd-automatic-nvidia-1  | │ ❱ 205             raise AttributeError(item)                                 │
sd-automatic-nvidia-1  | │   206                                                                        │
sd-automatic-nvidia-1  | ╰──────────────────────────────────────────────────────────────────────────────╯
sd-automatic-nvidia-1  | AttributeError: __config__
sd-automatic-nvidia-1  |

Aug 01 '23 15:08 hleroy

Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ?

@hleroy --no-download

Aug 02 '23 01:08 Nuullll

Firstly, thanks to everyone for the great work put into vladmandic/automatic! I'm recording my experiences trying to use the Dockerfile with vast.ai in case it is useful for others. My apologies if the approach I took was not best practices or just plain wrong - I'm fairly new to docker so please take the following as the experiences of a naive end-user trying to get this to work on a GPU cloud provider.

My use case is that I have a Macbook Pro but I would like to build and use a docker image of vladmandic/automatic that can be used on a GPU cloud provider like vast.ai or runpod.io.

My config:

OS: MacOS Monterey 12.6
Docker engine: 24.0.2
Docker Compose: 2.19.1

Steps:

clone nopperl/automatic to my MBP
modify the Dockerfile FROM instruction: FROM --platform=linux/amd64 ${BASE_CUDA_CONTAINER}
run docker compose build -t alexeberts/stable-diffusion:sdnext-test-2 .
wait 30 mins
run docker push alexeberts/stable-diffusion:sdnext-test-2
setup template on vast.ai using alexeberts/stable-diffusion:sdnext-test-2
create instance on vast.ai using the ssh login option.
ssh into the instance and run entrypoint.sh

Results:

The container args INSTALLDIR etc are not automatically added to the new environment
After setting up the args manually, and running entrypoint.sh the server starts but with the same errors @hleroy and @hazrpg ran into.
I was not able to get a running instance of automatic.
I considered trying to build the image using docker compose build to see if I was missing configuration info from docker-compose.yml but I could not figure out how to ensure that docker compose build would build a linux/amd64 container (adding platform: linux/amd64 to the docker-compose.yml resulted in an error).

I'm happy to continue testing on vast.ai if someone can provide a linux image or instructions for how to successfully build a linux image from this repo on a MBP.

Aug 07 '23 18:08 aeberts

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x

I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.

It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.

I did eventually try the upstream docker apt packages, instead of the Canonical/Debian ones. Looks like although the Nvidia toolkit says it doesn't support newer versions, the lovely docker peeps must have gotten around that and made sure it still works. So I stand corrected, thank you for pointing it out.

However I'm still stuck at the middleware stage sadly even with the newer docker and using docker compose.

Aug 09 '23 08:08 hazrpg

Why did the PR stall? Was there a technical difficulty?

Nov 18 '23 13:11 AIWintermuteAI

moving status to draft until comments are incorporated and maintainer is found.

Nov 18 '23 16:11 vladmandic

What is the status of this PR? Using SD.next with docker install would be a huge win IMHO.

Jan 11 '24 11:01 ilkersigirci

there are plenty of users using sdnext inside a docker container, but having an official dockerfile is tricky as everyone has their own idea what docker config should be like and it also varies on platform.

Jan 11 '24 12:01 vladmandic

On that note for anyone looking for a "one-click" docker deploy -- I have contributed to and am using grokuku/stable-diffusion on a linux host with nvidia gpu. It "just works" and stays up-to-date with master branch automatically. Read the readme ofc but an example run command:

docker run -d -p 9000:9000 -e "PUID=1000" -e "PGID=1000" -e "WEBUI_VERSION=04" -v /path/on/host/data:/config --runtime=nvidia --gpus all holaflenain/stable-diffusion

Jan 11 '24 14:01 FoxxMD