stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: Generation just hangs for ever before last step
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
Since the update 1.1, very often when I do batches of images, one of them will hang at one of the latest steps and never complete.
Clicking interrupt does nothing, so does skip and reloading the UI doesn't help, the whole UI is stuck and it seems that no other functionality works. The console shows the total progress this way (I'm generating 100 batches of one 512x512 images ) :
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 6.44it/s]
Total progress: 3%|█▉ | 60/2000 [00:11<04:26, 7.27it/s]
I can't do anything but start the whole thing
Steps to reproduce the problem
- Go to TXT2IMG or IMG2IMG
- Do a large batch of images
- At some point the generation will hang and nothing will work anymore
What should have happened?
The generation should have continued like it did before
Commit where the problem happens
c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd
What platforms do you use to access the UI ?
Windows 11, RTX3090
What browsers do you use to access the UI ?
Brave
Command Line Arguments
--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --xformers --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora
OR
--ckpt-dir 'G:\AI\Models\Stable-diffusion\Checkpoints' --otp-sdp-attention --embeddings-dir 'G:\AI\Models\Stable-diffusion\Embeddings' --lora-dir 'G:\AI\Models\Stable-diffusion\Lora
List of extensions
ControlNet v1.1.134 Image browser
Console logs
venv "G:\AI\Image Gen\A1111\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: c3eced22fc7b9da4fbb2f55f2d53a7e5e511cfbd
Installing xformers
Collecting xformers==0.0.17
Using cached xformers-0.0.17-cp310-cp310-win_amd64.whl (112.6 MB)
Installing collected packages: xformers
Successfully installed xformers-0.0.16
Installing requirements
Installing ImageReward requirement for image browser
Launching Web UI with arguments: --autolaunch --ckpt-dir G:\AI\Models\Stable-diffusion\Checkpoints --xformers --embeddings-dir G:\AI\Models\Stable-diffusion\Embeddings --lora-dir G:\AI\Models\Stable-diffusion\Lora --reinstall-xformers
ControlNet v1.1.134
ControlNet v1.1.134
Loading weights [3dcc66eccf] from G:\AI\Models\Stable-diffusion\Checkpoints\Men\Saruman.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Textual inversion embeddings loaded(15): bad-artist, bad-artist-anime, bad-hands-5, bad-image-v2-39000, bad-picture-chill-75v, bad_prompt, bad_prompt_version2, badhandv4, charturnerv2, easynegative, HyperStylizeV6, ng_deepnegative_v1_75t, pureerosface_v1, ulzzang-6500, ulzzang-6500-v1.1
Textual inversion embeddings skipped(4): 21charturnerv2, nartfixer, nfixer, nrealfixer
Model loaded in 7.2s (load weights from disk: 2.5s, create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.5s, move model to device: 0.6s, load textual inversion embeddings: 2.5s).
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 19.8s (import torch: 2.7s, import gradio: 2.2s, import ldm: 1.0s, other imports: 2.4s, list SD models: 0.4s, setup codeformer: 0.1s, load scripts: 1.8s, load SD checkpoint: 7.2s, create ui: 1.2s, gradio launch: 0.7s).
Loading weights [c6bbc15e32] from G:\AI\Models\Stable-diffusion\Checkpoints\0\1.5-inpainting.ckpt
Creating model from config: G:\AI\Image Gen\A1111\stable-diffusion-webui\configs\v1-inpainting-inference.yaml
LatentInpaintDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.54 M params.
Loading VAE weights specified in settings: G:\AI\Image Gen\A1111\stable-diffusion-webui\models\VAE\NewVAE.vae.pt
Applying xformers cross attention optimization.
Model loaded in 2.0s (create model: 0.4s, apply weights to model: 0.4s, apply half(): 0.3s, load VAE: 0.2s, move model to device: 0.6s).
Running DDIM Sampling with 19 timesteps
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:02<00:00, 9.21it/s]
Running DDIM Sampling with 19 timesteps | 18/2000 [00:01<03:04, 10.77it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.87it/s]
Running DDIM Sampling with 19 timesteps | 38/2000 [00:04<02:31, 12.94it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.92it/s]
Running DDIM Sampling with 19 timesteps | 56/2000 [00:07<02:37, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.33it/s]
Running DDIM Sampling with 19 timesteps | 76/2000 [00:10<02:29, 12.88it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 12.03it/s]
Running DDIM Sampling with 19 timesteps | 94/2000 [00:13<03:02, 10.43it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.91it/s]
Running DDIM Sampling with 19 timesteps | 113/2000 [00:15<02:33, 12.31it/s]
Decoding image: 100%|██████████████████████████████████████████████████████████████████| 19/19 [00:01<00:00, 13.84it/s]
Running DDIM Sampling with 19 timesteps | 133/2000 [00:18<02:23, 13.03it/s]
Decoding image: 21%|██████████████ | 4/19 [00:00<00:01, 11.32it/s]
Total progress: 7%|████▎ | 137/2000 [00:21<04:56, 6.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 6.90it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 6.94it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 7.14it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 6.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 6.81it/s]
0%| | 0/20 [00:00<?, ?it/s]
Total progress: 5%|███▏ | 101/2000 [00:23<07:14, 4.37it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 7.10it/s]
75%|█████████████████████████████████████████████████████████████▌ | 15/20 [00:02<00:00, 6.22it/s]
Total progress: 2%|█▏ | 36/2000 [00:07<06:58, 4.69it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 6.17it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 6.89it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 7.07it/s]
10%|████████▎ | 2/20 [00:00<00:03, 4.87it/s]
Total progress: 3%|██ | 63/2000 [00:14<07:18, 4.42it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 7.57it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 6.99it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 6.44it/s]
Total progress: 3%|█▉ | 60/2000 [00:11<04:26, 7.27it/s]
Additional information
I remember that at some point it hanged but got unstuck somehow and I got an error which I don't remember but it did say to use --no-half-vae, I haven't tested that and never needed that before on torch 1.13.1 for tens of thousands of gens. I'm exclusively using the new 840000 mse VAE
New information: I've tried --no-half-vae and it doesn't change anything. Also the hanging seems to also happen when I try to interrupt some gens, still no information in the console
It began to arise more and more often, rebooting sdiffused and chrome no longer helps. Gives 1-2 generations and again an error.
Please help!
p.s windows 11, rtx 3060 (last drivers)
Sorry, I have vladmandic/automatic - the bug report is not for you, but the error is exactly the same and I have not found a similar one anywhere.
I've also been having this issue since one of the recent updates.
Im also having the same issue.
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?
The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv
This has to do with cu118 or Torch 2.0, I reverted to 1.13.1+cu117 and I never get it
so you back to 1.0 version of A1111 ? Or you just use 1.13.1+cu117 in A1111 v1.1 ?
The UI still works with 1.13.1, I changed a line in launch.py that talked about torch to change it back to 1.13.1+cu117 but I don't know exactly how anymore since I'm on my phone, and added --reinstall-torch to the command line arguments, but tbh now I just renamed venv to venv2 so I have both versiond of torch at the ready by just renaming venv
if I understand correctly, I have to change this line here :
Found the commit that changed it : https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/d5063e07e8b4737621978feffd37b18077b9ea64 just revert the change from launch.py
Having same issue, can't reliably reproduce, it just happens when it happens, and there's no hints in the console for troubleshooting.
I get this too, I have checked out a commit from around this date, can't remember which one as today I noticed I had for some reason changed to latest commit, so I had to do a checkout again, this one doesn't seem to be stopping when batch generating images, at least it hasn't so far:
I also have this problem and reverting to the master branch deployment at (22bcc7be428c94e9408f589966c2040187245d81) does indeed solve the problems -- but of course this is far less than ideal as a solution, as there has been a lot of development in the last 5 weeks and we are out in the wind...
For those looking for a temp fix that already have torch 2.0+cu118 (you can see it at the bottom of the UI)
- Rename the venv folder inside the stable-diffusion-webui folder to venvTorch2 or something
- Modify Launch.py by replacing the following lines (by what comes after the ":") check the warning bellow if you can't find them
225 :
torch_command = os.environ.get('TORCH_COMMAND', "pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117")
228 :xformers_package = os.environ.get('XFORMERS_PACKAGE', 'xformers==0.0.16rc425')
⚠️ in recent commits, those lines changed to 240 and 243, this can vary from version to version so try to find them if you don't see them directly
- Then add
--reinstall-torch
and--reinstall-xformers
(if you use the latter) in the webui-user.bat file in the stable-diffusion-webui folder next toset COMMANDLINE_ARGS=
or add it to the additional arguments if you use my easy launcher and save. - Relaunch the UI via Webui-user.bat or my launcher
- This will create a new venv folder with the old torch versions that still work perfectly well
- Now if you ever want to go back to torch 2.0 when it's fixed, just rename the new venv folder to venvTorch1 and rename venvTorch2 to venv
- You can switch back to torch 1 by doing it the other way around ofc
same issue after updating to torch 2, it seems to hang on simple prompts for me, more complex ones run on in generate forever ok but if i use a very few words it hangs after a few image generations and I have to close the cmd window and restart with webui-user.bat, just reloading the web ui doesn't work. I also upgraded pip but it still does it on occasion . Never had the problem occur before torch 2 upgrade
Also getting this issue.
Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?
Mozoloa, thanks for the workaround but just a patch from devs seems like a must. Why the wait?
I'm not sure I understand what you're saying
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
Same problem. Solved it on the advice from reddit. In settings on 'live preview' tab I increased number of 'every N sampling steps' to 5 (it was 1 before). Also for 'Image creation progress preview mode' I chose the option 'Approx cheap'. After these actions, the problem did not appear. Previously, every 10-20 generation ended with a hang and had to restart WebUI completely
That's a good find, altho I like to see the preview as soon as possible and in full so I'll stay on torch 1 for now
I'm also sticking to torch 1, I get even slightly better performance on it.
I just deleted the entire venv.
I also included --skip-version-check because it shows a message saying "This was tested to work with Torch 2.0" which is obviously a lie.
Has anyone tried with 1.2.0 yet ? wondering if this still does it but i'm on torch 1
Still happens on 1.2.0 for me, had to revert to old torch like described above.
I've just discovered something very weird, I have often this hanging bug when I Hires. fix, I've just checked out my task manager and Discord took me like 50% of my GPU when I'm on SD, quitting Discord fixed this bug. Why the fuck Discord took me that much, is it only me ?
xformers being enabled or not has no affect on this.
This hang has been rather random. Most of the time it will happen within 3-5 gens from launch, but sometimes it goes for many dozens while other times it will happen on the first. Prompt+seed doesn't matter, run the same settings each time will have chances of triggering it.
XYZ of more than a few is highly risky.
I just cant get this fixed somehow. Time to check Vlad again smh
meet with same problem
pip install image-reward. )
Same problem after full reset of the UI