stable-diffusion-webui [Bug]: VRAM usage is way higher

Is there an existing issue for this?

[x] I have searched the existing issues and checked the recent builds/commits

What happened?

I updated the WebUI a few minutes ago and now the VRAM usage when generating an image is way higher. I have 3 monitors (2x 1920x1080 & 1x 2560x1440), I use Wallpaper Engine on all of them, but I have Discord open on of them nearly 24/7, so Wallpaper Engine is only active for two monitors. 1.5 GB VRAM are used when I am on the Desktop without the WebUI running. Web Browers: Microsoft Edge (Chromium) OS: Windows 11 (Build number: 22621.963) GPU: NVIDIA GeForce RTX 3070 Ti (KFA2) CPU: Intel Core i7-11700K RAM: Corsair VENGEANCE LPX 32 GB (2 x 16 GB) DDR4 DRAM 3200 MHz C16

Steps to reproduce the problem

Start the WebUI
Use the following settings to generate an image

Positive prompt: masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, Steps: 50, Sampler: Euler a, CFG scale: 12, Seed: 3607441108, Size: 512x768, Model hash: 8d9aaa54, Model: Anything V3 (non pruned with vae), Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

What should have happened?

The generation should complete without any errors

Commit where the problem happens

1cfd8aec4ae5a6ca1afd67b44cb4ef6dd14d8c34

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--xformers

Additional information, context and logs

I have the config for animefull from the Novel AI leak in the configs folder under the name Anything V3.0.yaml, but I get this error too when I remove it from the configs folder and completely restart the WebUI. This is the error I get

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 4.70 GiB already allocated; 0 bytes free; 5.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Jan 04 '23 14:01 shimizu-izumi

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

Jan 04 '23 16:01 ClashSAN

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today. :/

Jan 04 '23 16:01 walkerakiz

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

I updated the WebUI around 2 PM UTC+1. The last major Windows update was a few weeks ago. When I used the WebUI a few days ago, everything still worked without any errors, and I don't have the openoutpaint extension.

Jan 04 '23 16:01 shimizu-izumi

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Jan 04 '23 19:01 mxzgithub

Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

I might be mistaken because you, but I think the culprit is the new Highres fix. It upscales the images before processing them for the second time and they may be too big to fit into your VRAM. I see a lot of people complaining about how confusing it to use and how it gives inferior results. In my experience as well it is of a questionable usability right now.

If you really need to use the Highres fix now, try setting the upscaling factor to 1. It somehow makes it behave, even though its counter-intuitive, and the default setting is 2. Here are some examples I got: Default settings (upscale by 2): Upscale by 1: 01255-644973770-(extremely detailed CG unity 8k wallpaper), full shot body photo of a (((beautiful badass woman soldier))) with ((white hair)),

On the other hand, I just noticed that you have a lot of ram, so it makes me think I'm completely wrong about my assumption, and there is something else entirely going on. I'm going to try and use your settings with the same model and see what I get on 8 gb.

Jan 04 '23 19:01 Alphyn-gunner

Here's the result I got: `masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, Steps: 50, Sampler: Euler a, CFG scale: 7, Seed: 3607441108, Size: 512x768, Model: Anything-V3.0-pruned-fp32, Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+ Anime6B

Time taken: 4m 49.25sTorch active/reserved: 4777/6598 MiB, Sys VRAM: 8192/8192 MiB (100.0%)` It used all the available memory, but didn't run out. It also made the image twice the size I ordered and it took me almost 5 minutes on a 1070 ti.

Commit hash: 24d4a0841d3cc0e5908b098f65a9caa3fa889af8

Jan 04 '23 20:01 Alphyn-gunner

@Alphyn-gunner It's twice the size because of the hires upscale value.

Jan 04 '23 20:01 shimizu-izumi

I also noticed that I now get completely different results with the exact same settings. 00017

Jan 04 '23 21:01 shimizu-izumi

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Could you post the before and after image size limit?

I also noticed that I now get completely different results with the exact same settings.

Were you using xformers?

Jan 04 '23 23:01 ClashSAN

I have the same problem, and I don't even use the hi-res fix! I just do normal gen but the VRAM usage is WAYYYYY higher now! I can't do the same batch size that I used to be able to do previously! Everything else is the same, I changed nothing. It only git pulled..

Jan 04 '23 23:01 lolxdmainkaisemaanlu

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today.

I honestly thought I was the only one. Generating images is SOO much slower now(And I have a 4090). I really wish there was a way to revert back to the previous update.

I also noticed that I now get completely different results with the exact same settings.

Also getting the same problem. I was wondering why hires was taking so long now so I decided to recreate one of my previous images and I got nothing like it with all the same settings and it took forever.

Jan 05 '23 05:01 Campfirecrucifix

In the latest versions, hires fix have been modified. Do the 5f4fa942b8ec3ed3b15a352903489d6f9e6eb46e versions also have bugs?

Jan 05 '23 05:01 mykeehu

For what it's worth I've also noticed this when training an embedding as of updating today via a fresh install. I have an old version which doesn't have any issues which was how the repository was as of 11/5. I have a lower end card (RTX 2060 6G) so embeddings are all I can do for the moment.

Previously I could train a 512/512 embedding and use the "Read parameters" option on the SD1.4 checkpoint. The message I get states 512mb additional VRAM is needed. For experimentation, I lowered the 512 values and the embedding began to train. However, when it tried to generate an image mid-training, the CUDA memory issue occurred again.

It is worth noting that I'm able to use regular prompts as well as the embedding that was terminated early after running out of memory. So this might be helpful in determining what the cause is.

Jan 05 '23 05:01 GarbageHaus

Same here, as suggested using a less extreme upscale option worked. However, it is considerably slower still. having different highers fix back ends is nice and might yield better results, but why is this the only option? Why not add both?

What is the last known commit that doesn't have this change? I think I'll switch back for that in the time being.

Jan 05 '23 06:01 nonetrix

The currently Hires. Fix seems to be tuned much more for higher end cards. It would be very helpful if there was a way to tuned the Hires. Fix to the previous settings, either a direct option or an update to the wiki, for 8GB and lower cards.

Jan 05 '23 06:01 Nilok7

For now you could always checkout a previous version:

git checkout fd4461d44c7256d56889f5b5ed9fb660a859172f

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

Jan 05 '23 08:01 DrGunnarMallon

Yes, I use xformers. What do you mean by image size limit?

Jan 05 '23 21:01 shimizu-izumi

I have the same issue. Found it while using Hi-res fix. I completely understand how to use it, that's not the issue. Now I run out of vram for the same batch sizes/dimensions as before @lolxdmainkaisemaanlu also pointed out the same except they are not even using hi-res. I just happened to notice it on hi-res. Its an independent issue from hi-res fix it seems. reverting fd4461d as well curtousy to @DrGunnarMallon

Jan 05 '23 21:01 nanafy

For now you could always checkout a previous version:

git checkout fd4461d

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

I'm running A1111 on a 2060 Super, so 8GB of VRAM.

I had a bit of a workflow to do a couple of 512x512 low-level passes, and then bumped it up to 768 to start getting in detail, finally finishing off and upscaling to 1024. I've been doing passes of this process for almost a week (I've been making daily "Twelve Days of Christmas" images).

Even on my older card, it works. Now, even going from 512 to 768 with just 50 steps it just wrecks. I currently cannot render anything at 768x768.

I tried resetting to the hash recommended above, but I'm still going OOM. Is there another hash to recommend reverting to prior to that?

Error completing request
Arguments: (0, 'a photograph of  a single red apple, on a yellow plate, on a blue checkered tablecloth.', '', 'None', 'None', <PIL.Image.Image image mode=RGBA size=512x512 at 0x1EFB7F20DF0>, None, None, None, None, 0, 50, 0, 4, 0, 1, False, False, 1, 4, 7, 0.2, 1254105237.0, -1.0, 0, 0, 0, False, 768, 768, 0, False, 32, 0, '', '', 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, None, None, '', '', '', '', 'Auto rename', {'label': 'Upload avatars config'}, 'Open outputs directory', 'Export to WebUI style', True, {'label': 'Presets'}, {'label': 'QC preview'}, '', [], 'Select', 'QC scan', 'Show pics', None, False, False, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 'Positive', 0, ', ', True, 32, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\modules\img2img.py", line 152, in img2img
    processed = process_images(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 471, in process_images
    res = process_images_inner(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 541, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 887, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 526, in forward
    h = self.down[i_level].block[i_block](hs[-1], temb)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 138, in forward
    h = self.norm2(h)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\normalization.py", line 272, in forward
    return F.group_norm(
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 8.00 GiB total capacity; 5.29 GiB already allocated; 0 bytes free; 6.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Jan 05 '23 22:01 DoughyInTheMiddle

4af3ca5393151d61363c30eef4965e694eeac15e try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

Jan 05 '23 23:01 nanafy

4af3ca5 try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

That one isn't working for me either. Still going OOM.

After bashing git checkout xxxxxx, is there anything else I need to do other than to close the console and restart?

Jan 06 '23 04:01 DoughyInTheMiddle

When you open your auto1111 cmd, it tells you the commit version as soon as you run the webui.bat Does it say Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e Installing requirements for Web UI...

Jan 06 '23 14:01 nanafy

I restored back to the master branch and, NVidia just put out a driver update.

One of the two affected things, so at least I'm getting things to work better. Memory usage SEEMS better. Still watching it though for a bit.

Jan 06 '23 17:01 DoughyInTheMiddle

Did you add git pull to your webui script? I've seen a few do that, for me at least reverting back to a old version fixed it for me. Funny because this change made me think xformers was the issue, I guess I'll have to give it another chance I was harsh

Jan 06 '23 21:01 nonetrix

I'm not sure how related this is, but I haven't seen anybody else mention it. Loading a model in the webui, including at launch, has a coinflip's chance of maxing out my 8GB vram instantly and freezing my PC entirely. Has anybody else experienced this issue? This has been a thing since a few pulls now, even before the suspension. I have been running the webui inside a docker image on Ubuntu 20.04 with rocm and an RX 5700 XT AMD card.

Jan 07 '23 21:01 DoctorPavel

Having the same issue with just loading the Webui immediately uses and keeps using 5 out of the 8 GB of VRAM all since the new hires fix was implemented (most common error it OoM's on has to do with resolution scaling (even with hires fix disabled).. am not using SD2.x models at all so those should not be the issue.

with each generation the amount of VRAM in use seems to increase by a few MB ... (which stacks up fast over time) ... img2img is a no go at all as it immediately OoM's

Jan 11 '23 23:01 ChinatsuHS

Same issue here.

RuntimeError: CUDA out of memory. Tried to allocate 76.38 GiB (GPU 0; 12.00 GiB total capacity; 2.57 GiB already allocated; 7.19 GiB free; 2.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Time taken: 16.44sTorch active/reserved: 2757/2774 MiB, Sys VRAM: 5051/12288 MiB (41.11%)

Jan 17 '23 06:01 ImBadAtNames2019

See possible source in "new hires": https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/6725

Jan 21 '23 16:01 Centurion-Rome

I do not use Hires Fix, but I can no longer change models on Colab because it causes memory overflow:

--lowram, --lowvram and --medvram options no helped. This is the default RAM reservation at startup:

Update: I found a solution:

set VAE to None
under Settings -> Stable Diffusion, set Checkpoints and VAE cache to zero
save the settings and shut down SD (GUI restart is not enough!)
start again.

Regardless, I saw that every time I change the model, it occupies 1 GB more memory, so after a while it causes a memory overflow again.

Jan 23 '23 09:01 mykeehu

I have this problem as well. It consists of..

when I open the webui my vram is at 5000ish instead of the normal 500ish. This is idle usage
when I switch models or generate multiple picture in which the model switches via x\y\z my memory usage grows steadily until it maxes out.

Mar 15 '23 11:03 Mistborn-First-Era

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Bug]: VRAM usage is way higher

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access UI ?

What browsers do you use to access the UI ?

Command Line Arguments

Additional information, context and logs

stable-diffusion-webui
stable-diffusion-webui copied to clipboard