stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: VRAM usage is over9000
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
Not sure why but it happens, i had 1080ti previously with 11GB and webui ran getting res with over1000pix Now i have rtx3090 and webui eats 15GB !!!!!! This is pretty insane, this wouldnt happen on 1080ti and it cripples the performance of 3090 Not sure if the ram is being reserved according to how much you have on gpu but cmon, make it take as much as on 1080ti or at least make it optionable to work like this, torch reserves 15gb for what ?
Time taken: 24.27sTorch active/reserved: 15082/21360 MiB, Sys VRAM: 23654/24576 MiB (96.25%)
Steps to reproduce the problem
Everytime you generate an image then it adds up 100mb to VRAM usage, and it stacks it up over and over so when it starts its using about 3GB VRAM but it gets to 15GB pretty quickly , its definitely a major bug. .
What should have happened?
.
Commit where the problem happens
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/7945
What platforms do you use to access the UI ?
Windows
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
no
List of extensions
batch-face-swap | https://github.com/kex0/batch-face-swap.git | unknown custom-diffusion-webui | https://github.com/guaneec/custom-diffusion-webui.git | unknown embedding-inspector | https://github.com/tkalayci71/embedding-inspector.git | unknown sd-webui-additional-networks | https://github.com/kohya-ss/sd-webui-additional-networks.git | unknown sd-webui-controlnet | https://github.com/Mikubill/sd-webui-controlnet | unknown sd-webui-model-converter | https://github.com/Akegarasu/sd-webui-model-converter.git | unknown sd-webui-riffusion | https://github.com/enlyth/sd-webui-riffusion.git | unknown sd-webui-supermerger | https://github.com/hako-mikan/sd-webui-supermerger.git | unknown stable-diffusion-webui-fix-image-paste | https://github.com/klimaleksus/stable-diffusion-webui-fix-image-paste | unknown stable-diffusion-webui-fix-image-paste-master | | stable-diffusion-webui-instruct-pix2pix | https://github.com/Klace/stable-diffusion-webui-instruct-pix2pix | unknown stable-diffusion-webui-pixelization | https://github.com/AUTOMATIC1111/stable-diffusion-webui-pixelization.git | unknown ultimate-upscale-for-automatic1111 | https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git | unknown LDSR | built-in | Lora | built-in | ScuNET | built-in | SwinIR | built-in | prompt-bracket-checker | built-in | roll-artist | built-in |
Console logs
.
Additional information
No response
--opt-split-attention-v1
the newer split attention states its supposed to seek out all the memory it can find, so you can play around with different settings
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings
this makes me able to get to step 20 on my amd windows install. without it I would go out of memory on step 2 with my 16GB 6900XT. however now during the final step I get
NotImplementedError: No operator found for memory_efficient_attention_forward
with inputs:
query : shape=(1, 4096, 1, 512) (torch.float16)
key : shape=(1, 4096, 1, 512) (torch.float16)
value : shape=(1, 4096, 1, 512) (torch.float16)
attn_bias : <class 'NoneType'>
p : 0.0
cutlassF
is not supported because:
device=privateuseone (supported: {'cuda'})
flshattF
is not supported because:
device=privateuseone (supported: {'cuda'})
max(query.shape[-1] != value.shape[-1]) > 128
tritonflashattF
is not supported because:
device=privateuseone (supported: {'cuda'})
max(query.shape[-1] != value.shape[-1]) > 128
triton is not available
smallkF
is not supported because:
device=privateuseone (supported: {'cpu', 'cuda'})
dtype=torch.float16 (supported: {torch.float32})
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 512
that is different^
I have a lot of memory issue as of late and I can't explain them either.
This is using img2img on large scale image (2560x1440) using 24gb VRAM (TITAN RTX). This used to work just fine even with only medvram and have plenty of images in my output directory, even using controlnet with a 1024/576 sized control image that worked 3 days ago.
The iterations are actually working, but it crashes at the end of the iterations for some reason while trying to allocate a huge amount of VRAM (15gb or so).
I tried using the --opt-split-attention-v1 option, also --lowvram without any change/success.
When I reduce drastically the image size (to 1024 / 576) I still get out of memory issues.
Tried to delete venv/reinstall fully. No success again.
False, 'Just Resize', False, True, 64, 64, 64, 1, 1, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '
- \n
CFG Scale
should be 2 or lower. \n
Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8
', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', 'Will upscale the image by the selected scale factor; use width and height sliders to set tile size
', 64, 0, 2, 1, '', 0, '', 0, '', True, False, False, False, 0, None, None, 50, 0, 0, 512, 512, False, False, True, True, True, False, True, 1, False, False, 2.5, 4, 0, False, 0, 1, False, False, 'u2net', False, False, False, False, 'Will upscale the image depending on the selected target size type
', 512, 0, 8, 32, 64, 0.35, 32, 0, True, 0, False, 8, 0, 0, 2048, 2048, 2) {} Traceback (most recent call last): File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\call_queue.py", line 56, in f res = list(func(*args, **kwargs)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\call_queue.py", line 37, in f res = func(*args, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\img2img.py", line 171, in img2img processed = process_images(p) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\processing.py", line 486, in process_images res = process_images_inner(p) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\processing.py", line 632, in process_images_inner samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\processing.py", line 1048, in sample samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 322, in sample_img2img samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 225, in launch_sampling return func() File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 322, in
Pressing generate again after a out of memory error and eventually it works without changing input...puzzling.
It seems to always error after a webui restart or a change of image resolution, pressing generate again can work.
--opt-split-attention-v1 --xformers
This solved it, that split attention should be used by default !!
--opt-split-attention-v1 --xformers
This also fixed it for me for Nvidia GTX1070 8GB Thanks