stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: Generation just hangs for ever before last step

Open Mozoloa opened this issue 1 year ago • 28 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

Since the update 1.1, very often when I do batches of images, one of them will hang at one of the latest steps and never complete.

Clicking interrupt does nothing, so does skip and reloading the UI doesn't help, the whole UI is stuck and it seems that no other functionality works. The console shows the total progress this way (I'm generating 100 batches of one 512x512 images ) :

100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.44it/s]
Total progress:   3%|█▉                                                              | 60/2000 [00:11<04:26,  7.27it/s]

I can't do anything but start the whole thing

Steps to reproduce the problem

  1. Go to TXT2IMG or IMG2IMG
  2. Do a large batch of images
  3. At some point the generation will hang and nothing will work anymore

What should have happened?

The generation should have continued like it did before

Commit where the problem happens

c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd

What platforms do you use to access the UI ?

Windows 11, RTX3090

What browsers do you use to access the UI ?

Brave

Command Line Arguments

--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --xformers --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora

OR

--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --otp-sdp-attention --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora

List of extensions

ControlNet v1.1.134 Image browser

Console logs

venv "G:\AI\Image Gen\A1111\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd
Installing xformers
Collecting xformers==0.0.17
  Using cached xformers-0.0.17-cp310-cp310-win_amd64.whl (112.6 MB)
Installing collected packages: xformers
Successfully installed xformers-0.0.16
Installing requirements


Installing ImageReward requirement for image browser

Launching Web UI with arguments: --autolaunch --ckpt-dir G:\AI\Models\Stable-diffusion\Checkpoints --xformers --embeddings-dir G:\AI\Models\Stable-diffusion\Embeddings --lora-dir G:\AI\Models\Stable-diffusion\Lora --reinstall-xformers
ControlNet v1.1.134
ControlNet v1.1.134
Loading weights [3dcc66eccf] from G:\AI\Models\Stable-diffusion\Checkpoints\Men\Saruman.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(15): bad-artist, bad-artist-anime, bad-hands-5, bad-image-v2-39000, bad-picture-chill-75v, bad_prompt, bad_prompt_version2, badhandv4, charturnerv2, easynegative, HyperStylizeV6, ng_deepnegative_v1_75t, pureerosface_v1, ulzzang-6500, ulzzang-6500-v1.1
Textual inversion embeddings skipped(4): 21charturnerv2, nartfixer, nfixer, nrealfixer
Model loaded in 7.2s (load weights from disk: 2.5s, create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.5s, move model to device: 0.6s, load textual inversion embeddings: 2.5s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 19.8s (import torch: 2.7s, import gradio: 2.2s, import ldm: 1.0s, other imports: 2.4s, list SD models: 0.4s, setup codeformer: 0.1s, load scripts: 1.8s, load SD checkpoint: 7.2s, create ui: 1.2s, gradio launch: 0.7s).
Loading weights [c6bbc15e32] from G:\AI\Models\Stable-diffusion\Checkpoints\0\1.5-inpainting.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inpainting-inference.yaml
LatentInpaintDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.54 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Model loaded in 2.0s (create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.2s, move model to device: 0.6s).
Running DDIM Sampling with 19 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00,  9.21it/s]
Running DDIM Sampling with 19 timesteps                                              | 18/2000 [00:01<03:04, 10.77it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.87it/s]
Running DDIM Sampling with 19 timesteps                                              | 38/2000 [00:04<02:31, 12.94it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.92it/s]
Running DDIM Sampling with 19 timesteps                                              | 56/2000 [00:07<02:37, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.33it/s]
Running DDIM Sampling with 19 timesteps                                              | 76/2000 [00:10<02:29, 12.88it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.03it/s]
Running DDIM Sampling with 19 timesteps                                              | 94/2000 [00:13<03:02, 10.43it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.91it/s]
Running DDIM Sampling with 19 timesteps                                             | 113/2000 [00:15<02:33, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.84it/s]
Running DDIM Sampling with 19 timesteps                                             | 133/2000 [00:18<02:23, 13.03it/s]
Decoding image:  21%|██████████████                                                     | 4/19 [00:00<00:01, 11.32it/s]
Total progress:   7%|████▎                                                          | 137/2000 [00:21<04:56,  6.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.90it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.94it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.14it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.81it/s]
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]
Total progress:   5%|███▏                                                           | 101/2000 [00:23<07:14,  4.37it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.10it/s]
 75%|█████████████████████████████████████████████████████████████▌                    | 15/20 [00:02<00:00,  6.22it/s]
Total progress:   2%|█▏                                                              | 36/2000 [00:07<06:58,  4.69it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.17it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.89it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.07it/s]
 10%|████████▎                                                                          | 2/20 [00:00<00:03,  4.87it/s]
Total progress:   3%|██                                                              | 63/2000 [00:14<07:18,  4.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.57it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.44it/s]
Total progress:   3%|█▉                                                              | 60/2000 [00:11<04:26,  7.27it/s]

Additional information

I remember that at some point it hanged but got unstuck somehow and I got an error which I don't remember but it did say to use --no-half-vae, I haven't tested that and never needed that before on torch 1.13.1 for tens of thousands of gens. I'm exclusively using the new 840000 mse VAE

Mozoloa avatar May 05 '23 07:05 Mozoloa

New information: I've tried --no-half-vae and it doesn't change anything. Also the hanging seems to also happen when I try to interrupt some gens, still no information in the console

Mozoloa avatar May 05 '23 07:05 Mozoloa

It began to arise more and more often, rebooting sdiffused and chrome no longer helps. Gives 1-2 generations and again an error.

Please help! 2023-05-05_12-18-42 p.s windows 11, rtx 3060 (last drivers)

Sorry, I have vladmandic/automatic - the bug report is not for you, but the error is exactly the same and I have not found a similar one anywhere.

begon123 avatar May 05 '23 09:05 begon123

I've also been having this issue since one of the recent updates.

VRArt1 avatar May 05 '23 14:05 VRArt1

Im also having the same issue.

Chem1ce avatar May 06 '23 10:05 Chem1ce

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

Mozoloa avatar May 06 '23 10:05 Mozoloa

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?

ChenNdG avatar May 06 '23 13:05 ChenNdG

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?

The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv

Mozoloa avatar May 06 '23 13:05 Mozoloa

This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it

so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?

The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv

if I understand correctly, I have to change this line here : image

ChenNdG avatar May 06 '23 13:05 ChenNdG

Found the commit that changed it : https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/d5063e07e8b4737621978feffd37b18077b9ea64 just revert the change from launch.py

Mozoloa avatar May 06 '23 13:05 Mozoloa

Found the commit that changed it : d5063e0 just revert the change from launch.py

Thanks !

ChenNdG avatar May 06 '23 14:05 ChenNdG

Having same issue, can't reliably reproduce, it just happens when it happens, and there's no hints in the console for troubleshooting.

halr9000 avatar May 08 '23 11:05 halr9000

I get this too, I have checked out a commit from around this date, can't remember which one as today I noticed I had for some reason changed to latest commit, so I had to do a checkout again, this one doesn't seem to be stopping when batch generating images, at least it hasn't so far:

image

quasiblob avatar May 08 '23 13:05 quasiblob

I also have this problem and reverting to the master branch deployment at (22bcc7be428c94e9408f589966c2040187245d81) does indeed solve the problems -- but of course this is far less than ideal as a solution, as there has been a lot of development in the last 5 weeks and we are out in the wind...

marcsyp avatar May 08 '23 15:05 marcsyp

For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)

  • Rename the venv folder inside the stable-diffusion-webui folder to venvTorch2 or something
  • Modify Launch.py by replacing the following lines (by what comes after the ":") check the warning bellow if you can't find them 225 : torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117") 228 : xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')

⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly

  • Then add --reinstall-torch and --reinstall-xformers (if you use the latter) in the webui-user.bat file in the stable-diffusion-webui folder next to set COMMANDLINE_ARGS= or add it to the additional arguments if you use my easy launcher and save.
  • Relaunch the UI via Webui-user.bat or my launcher
  • This will create a new venv folder with the old torch versions that still work perfectly well
  • Now if you ever want to go back to torch 2.0 when it's fixed, just rename the new venv folder to venvTorch1 and rename venvTorch2 to venv
  • You can switch back to torch 1 by doing it the other way around ofc

Mozoloa avatar May 08 '23 15:05 Mozoloa

same issue after updating to torch 2, it seems to hang on simple prompts for me, more complex ones run on in generate forever ok but if i use a very few words it hangs after a few image generations and I have to close the cmd window and restart with webui-user.bat, just reloading the web ui doesn't work. I also upgraded pip but it still does it on occasion . Never had the problem occur before torch 2 upgrade

nickr61 avatar May 09 '23 17:05 nickr61

Also getting this issue.

poisenbery avatar May 10 '23 09:05 poisenbery

Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?

oliverban avatar May 11 '23 12:05 oliverban

Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?

I'm not sure I understand what you're saying

Mozoloa avatar May 11 '23 12:05 Mozoloa

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

ostap667inbox avatar May 11 '23 18:05 ostap667inbox

Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely

That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now

Mozoloa avatar May 11 '23 19:05 Mozoloa

I'm also sticking to torch 1, I get even slightly better performance on it.

NathanBonnet30 avatar May 12 '23 11:05 NathanBonnet30

I just deleted the entire venv.

I also included --skip-version-check because it shows a message saying "This was tested to work with Torch 2.0" which is obviously a lie.

poisenbery avatar May 12 '23 11:05 poisenbery

Has anyone tried with 1.2.0 yet ? wondering if this still does it but i'm on torch 1

Mozoloa avatar May 14 '23 00:05 Mozoloa

Still happens on 1.2.0 for me, had to revert to old torch like described above.

ecker00 avatar May 14 '23 06:05 ecker00

I've just discovered something very weird, I have often this hanging bug when I Hires. fix, I've just checked out my task manager and Discord took me like 50% of my GPU when I'm on SD, quitting Discord fixed this bug. Why the fuck Discord took me that much, is it only me ?

NathanBonnet30 avatar May 14 '23 11:05 NathanBonnet30

xformers being enabled or not has no affect on this.

This hang has been rather random. Most of the time it will happen within 3-5 gens from launch, but sometimes it goes for many dozens while other times it will happen on the first. Prompt+seed doesn't matter, run the same settings each time will have chances of triggering it.

XYZ of more than a few is highly risky.

Kadah avatar May 15 '23 07:05 Kadah

I just cant get this fixed somehow. Time to check Vlad again smh

elchupacabrinski avatar May 15 '23 17:05 elchupacabrinski

meet with same problem

Zuckonit avatar May 16 '23 06:05 Zuckonit

pip install image-reward. )

UtopiaEditorial avatar May 17 '23 19:05 UtopiaEditorial

Same problem after full reset of the UI

Mozoloa avatar May 17 '23 21:05 Mozoloa