stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: Multi-gpu launch with accelerate seemingly works wrong, dreambooth stalls
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
I launched webui.sh with the flag ACCELERATE="True" on my two gpus. The script spawned two webui servers processes running on different ports. The servers does not seem to communicate with each other and process requests independently. When running the dream booth training the script stalled.
Steps to reproduce the problem
- config accelerate
accelerate config
- uncomment the
export ACCELERATE="True"
in webui-user.sh - launch script as usual
bash webui.sh
What should have happened?
In theory, accelerate should split batch between the two gpus and allow me to train Dreambooth with larger batch which is crucial for the quality.
Commit where the problem happens
0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
What platforms do you use to access the UI ?
Linux
What browsers do you use to access the UI ?
Chrome
Command Line Arguments
--xformers --listen --enable-insecure-extension-access
List of extensions
dreambooth
Console logs
(venv) root@c1dc0cbb72ba:/home/root/stable-diffusion-webui# bash webui.sh
################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################
################################################################
Running on root user
################################################################
################################################################
Repo already cloned, using it as install directory
################################################################
################################################################
Create and activate python venv
################################################################
################################################################
Accelerating launch.py...
################################################################
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0]
Commit hash: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0]
Commit hash: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
Installing requirements for Web UI
Installing requirements for Web UI
#######################################################################################################
Initializing Dreambooth
If submitting an issue on github, please provide the below text for debugging purposes:
Python revision: 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0]
Dreambooth revision: bff61c6b92b79ece2e140ee240e0628dd9a4ebef
SD-WebUI revision: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
Checking Dreambooth requirements...
Ignoring tensorflow-macos: markers 'sys_platform == "darwin" and platform_machine == "arm64"' don't match your environment
Ignoring mediapipe-silicon: markers 'sys_platform == "darwin"' don't match your environment
Collecting accelerate==0.16.0
Using cached accelerate-0.16.0-py3-none-any.whl (199 kB)
Collecting gitpython~=3.1.31
Using cached GitPython-3.1.31-py3-none-any.whl (184 kB)
Collecting transformers~=4.26.1
Using cached transformers-4.26.1-py3-none-any.whl (6.3 MB)
Collecting requests
Using cached requests-2.28.2-py3-none-any.whl (62 kB)
Installing collected packages: requests, gitpython, accelerate, transformers
Attempting uninstall: requests
Found existing installation: requests 2.25.1
Uninstalling requests-2.25.1:
Successfully uninstalled requests-2.25.1
Attempting uninstall: gitpython
Found existing installation: GitPython 3.1.27
Uninstalling GitPython-3.1.27:
Successfully uninstalled GitPython-3.1.27
Attempting uninstall: accelerate
Found existing installation: accelerate 0.12.0
Uninstalling accelerate-0.12.0:
Successfully uninstalled accelerate-0.12.0
Attempting uninstall: transformers
Found existing installation: transformers 4.25.1
Uninstalling transformers-4.25.1:
Successfully uninstalled transformers-4.25.1
Successfully installed accelerate-0.16.0 gitpython-3.1.31 requests-2.28.2 transformers-4.26.1
[+] torch version 1.13.1+cu117 installed.
[+] torchvision version 0.14.1+cu117 installed.
[+] accelerate version 0.16.0 installed.
[+] bitsandbytes version 0.35.4 installed.
[+] diffusers version 0.13.1 installed.
[+] transformers version 4.26.1 installed.
[+] xformers version 0.0.17.dev464 installed.
#######################################################################################################
Launching Web UI with arguments: --xformers --listen --enable-insecure-extension-access
#######################################################################################################
Initializing Dreambooth
If submitting an issue on github, please provide the below text for debugging purposes:
Python revision: 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0]
Dreambooth revision: bff61c6b92b79ece2e140ee240e0628dd9a4ebef
SD-WebUI revision: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8
Checking Dreambooth requirements...
Failed to install Dreambooth requirements.
[+] torch version 1.13.1+cu117 installed.
[+] torchvision version 0.14.1+cu117 installed.
[+] accelerate version 0.16.0 installed.
[+] bitsandbytes version 0.35.4 installed.
[+] diffusers version 0.13.1 installed.
No package for transformers
[!] transformers NOT installed.
[+] xformers version 0.0.17.dev464 installed.
Launch errors detected: ['transformers not installed.']
#######################################################################################################
####################################################################################################### [0/211]
Launching Web UI with arguments: --xformers --listen --enable-insecure-extension-access
2023-03-11 12:43:25.860156: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-11 12:43:26.256961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-11 12:43:26.971803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:26.971916: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror:
libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:26.971930: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia
GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-11 12:43:27.214114: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:27.214227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror:
libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:27.214243: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia
GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Script path is
Script path is
Loading weights [d81e26379c] from /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.ckpt
Loading weights [d81e26379c] from /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.ckpt
Creating model from config: /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.yaml
LatentDiffusion: Running in eps-prediction mode
Creating model from config: /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0):
Model loaded in 5.2s (load weights from disk: 1.5s, create model: 0.6s, apply weights to model: 0.7s, apply half(): 0.4s, load VAE: 1.5s, move model to device: 0.5s).
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0):
Model loaded in 5.1s (load weights from disk: 1.4s, create model: 0.6s, apply weights to model: 0.6s, apply half(): 0.5s, load VAE: 1.4s, move model to device: 0.5s).
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
Running on local URL: http://0.0.0.0:7861
To create a public link, set `share=True` in `launch()`.
Model dir set to: models/dreambooth/dndpathfindercharacterrpg_lrx4
Model dir set to: models/dreambooth/dndpathfindercharacterrpg_lrx4
Model dir set to: models/dreambooth/dndpathfindercharacterrpg_lrx4
Initializing dreambooth training...
The version of diffusers is less than or equal to 0.14.0. Performing monkey-patch...
Additional information
No response