stable-diffusion-webui [Bug]: Multi-gpu launch with accelerate seemingly works wrong, dreambooth stalls

[Bug]: Multi-gpu launch with accelerate seemingly works wrong, dreambooth stalls

Open SavvaI opened this issue 1 year ago • 0 comments

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I launched webui.sh with the flag ACCELERATE="True" on my two gpus. The script spawned two webui servers processes running on different ports. The servers does not seem to communicate with each other and process requests independently. When running the dream booth training the script stalled.

Steps to reproduce the problem

config accelerate accelerate config
uncomment the export ACCELERATE="True" in webui-user.sh
launch script as usual bash webui.sh

What should have happened?

In theory, accelerate should split batch between the two gpus and allow me to train Dreambooth with larger batch which is crucial for the quality.

Commit where the problem happens

0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8

What platforms do you use to access the UI ?

Linux

What browsers do you use to access the UI ?

Chrome

Command Line Arguments

--xformers --listen --enable-insecure-extension-access

List of extensions

dreambooth

Console logs

(venv) root@c1dc0cbb72ba:/home/root/stable-diffusion-webui# bash webui.sh 

################################################################                                                                                                           
Install script for stable-diffusion + Web UI                                                                                                                               
Tested on Debian 11 (Bullseye)                                                                                                                                             
################################################################                                                                                                           
                                                                                                                                                                           
################################################################                                                                                                           
Running on root user                                                                                                                                                       
################################################################                                                                                                           
                                                                                                                                                                           
################################################################                                                                                                           
Repo already cloned, using it as install directory                                                                                                                         
################################################################                                                                                                           
                                                                                                                                                                           
################################################################                                                                                                           
Create and activate python venv                                                                                                                                            
################################################################                                                                                                           
                                                                                                                                                                           
################################################################                                                                                                           
Accelerating launch.py...                                                                                                                                                  
################################################################                                                                                                           
Python 3.9.13 (main, Aug 25 2022, 23:26:10)                                                                                                                                
[GCC 11.2.0]                                                                                                                                                               
Commit hash: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8                                                                                                                      
Python 3.9.13 (main, Aug 25 2022, 23:26:10)                                                                                                                                
[GCC 11.2.0]                                                                                                                                                               
Commit hash: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8                                                                                                                      
Installing requirements for Web UI                                                                                                                                         
Installing requirements for Web UI      
#######################################################################################################                                                                    
Initializing Dreambooth                                                                                                                                                    
If submitting an issue on github, please provide the below text for debugging purposes:                                                                                    
                                                                                                                                                                           
Python revision: 3.9.13 (main, Aug 25 2022, 23:26:10)                                                                                                                      
[GCC 11.2.0]                                                                                                                                                               
Dreambooth revision: bff61c6b92b79ece2e140ee240e0628dd9a4ebef                                                                                                              
SD-WebUI revision: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8                                                                                                                
                                                                                                                                                                           
Checking Dreambooth requirements...                                                                                                                                        
Ignoring tensorflow-macos: markers 'sys_platform == "darwin" and platform_machine == "arm64"' don't match your environment                                                 
Ignoring mediapipe-silicon: markers 'sys_platform == "darwin"' don't match your environment                                                                                
Collecting accelerate==0.16.0                                                                                                                                              
  Using cached accelerate-0.16.0-py3-none-any.whl (199 kB)                                                                                                                 
Collecting gitpython~=3.1.31                                                                                                                                               
  Using cached GitPython-3.1.31-py3-none-any.whl (184 kB)                                                                                                                  
Collecting transformers~=4.26.1                                                                                                                                            
  Using cached transformers-4.26.1-py3-none-any.whl (6.3 MB)                                                                                                               
Collecting requests                                                                                                                                                        
  Using cached requests-2.28.2-py3-none-any.whl (62 kB)                                                                                                                    
Installing collected packages: requests, gitpython, accelerate, transformers                                                                                               
  Attempting uninstall: requests                                                                                                                                           
    Found existing installation: requests 2.25.1                                                                                                                           
    Uninstalling requests-2.25.1:                                                                                                                                          
      Successfully uninstalled requests-2.25.1                                                                                                                             
  Attempting uninstall: gitpython                                                                                                                                          
    Found existing installation: GitPython 3.1.27                                                                                                                          
    Uninstalling GitPython-3.1.27:                                                                                                                                         
      Successfully uninstalled GitPython-3.1.27                                                                                                                            
  Attempting uninstall: accelerate                                                                                                                                         
    Found existing installation: accelerate 0.12.0                                                                                                                         
    Uninstalling accelerate-0.12.0:                                                                                                                                        
      Successfully uninstalled accelerate-0.12.0                                                                                                                           
  Attempting uninstall: transformers                                                                                                                                       
    Found existing installation: transformers 4.25.1                                                                                                                       
    Uninstalling transformers-4.25.1:                                                                                                                                      
      Successfully uninstalled transformers-4.25.1                                                                                                                         
Successfully installed accelerate-0.16.0 gitpython-3.1.31 requests-2.28.2 transformers-4.26.1    

[+] torch version 1.13.1+cu117 installed.                                                                                                                                  
[+] torchvision version 0.14.1+cu117 installed.                                                                                                                            
[+] accelerate version 0.16.0 installed.                                                                                                                                   
[+] bitsandbytes version 0.35.4 installed.                                                                                                                                 
[+] diffusers version 0.13.1 installed.                                                                                                                                    
[+] transformers version 4.26.1 installed.                                                                                                                                 
[+] xformers version 0.0.17.dev464 installed.                                                                                                                              
                                                                                                                                                                           
#######################################################################################################                                                                    
                                                                                                                                                                           
Launching Web UI with arguments: --xformers --listen --enable-insecure-extension-access                                                                                    
                                                                                                                                                                           
#######################################################################################################                                                                    
Initializing Dreambooth                                                                                                                                                    
If submitting an issue on github, please provide the below text for debugging purposes:                                                                                    
                                                                                                                                                                           
Python revision: 3.9.13 (main, Aug 25 2022, 23:26:10)                                                                                                                      
[GCC 11.2.0]                                                                                                                                                               
Dreambooth revision: bff61c6b92b79ece2e140ee240e0628dd9a4ebef                                                                                                              
SD-WebUI revision: 0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8                                                                                                                
                                                                                                                                                                           
Checking Dreambooth requirements...                                                                                                                                        
Failed to install Dreambooth requirements.                                                                                                                                 
[+] torch version 1.13.1+cu117 installed.                                                                                                                                  
[+] torchvision version 0.14.1+cu117 installed.                                                                                                                            
[+] accelerate version 0.16.0 installed.                                                                                                                                   
[+] bitsandbytes version 0.35.4 installed.                                                                                                                                 
[+] diffusers version 0.13.1 installed.                                                                                                                                    
No package for transformers                                                                                                                                                
[!] transformers NOT installed.                                                                                                                                            
[+] xformers version 0.0.17.dev464 installed.                                                                                                                              
Launch errors detected: ['transformers not installed.']                                                                                                                    
                                                                                                                                                                           
#######################################################################################################  
#######################################################################################################                                                             [0/211]

Launching Web UI with arguments: --xformers --listen --enable-insecure-extension-access
2023-03-11 12:43:25.860156: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-11 12:43:26.256961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-11 12:43:26.971803: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:26.971916: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror:
libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:26.971930: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia
GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-03-11 12:43:27.214114: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:27.214227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror:
libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/root/stable-diffusion-webui/venv/lib/python3.9/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-03-11 12:43:27.214243: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia
GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Script path is
Script path is
Loading weights [d81e26379c] from /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.ckpt
Loading weights [d81e26379c] from /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.ckpt
Creating model from config: /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.yaml
LatentDiffusion: Running in eps-prediction mode
Creating model from config: /home/root/stable-diffusion-webui/models/Stable-diffusion/dndpathfindercharacterrpg_lrx4/dndpathfindercharacterrpg_lrx4_2332938.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
DiffusionWrapper has 859.52 M params.
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0):
Model loaded in 5.2s (load weights from disk: 1.5s, create model: 0.6s, apply weights to model: 0.7s, apply half(): 0.4s, load VAE: 1.5s, move model to device: 0.5s).
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(0):
Model loaded in 5.1s (load weights from disk: 1.4s, create model: 0.6s, apply weights to model: 0.6s, apply half(): 0.5s, load VAE: 1.4s, move model to device: 0.5s).
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Running on local URL:  http://0.0.0.0:7861

To create a public link, set `share=True` in `launch()`.
Model dir set to: models/dreambooth/dndpathfindercharacterrpg_lrx4
Model dir set to: models/dreambooth/dndpathfindercharacterrpg_lrx4
Model dir set to: models/dreambooth/dndpathfindercharacterrpg_lrx4
Initializing dreambooth training...
The version of diffusers is less than or equal to 0.14.0. Performing monkey-patch...

Additional information

No response

Mar 11 '23 15:03 SavvaI

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Bug]: Multi-gpu launch with accelerate seemingly works wrong, dreambooth stalls

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard