stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: VRAM usage is over9000

Open 2blackbar opened this issue 1 year ago • 7 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

Not sure why but it happens, i had 1080ti previously with 11GB and webui ran getting res with over1000pix Now i have rtx3090 and webui eats 15GB !!!!!! This is pretty insane, this wouldnt happen on 1080ti and it cripples the performance of 3090 Not sure if the ram is being reserved according to how much you have on gpu but cmon, make it take as much as on 1080ti or at least make it optionable to work like this, torch reserves 15gb for what ?

Time taken: 24.27sTorch active/reserved: 15082/21360 MiB, Sys VRAM: 23654/24576 MiB (96.25%)

Steps to reproduce the problem

Everytime you generate an image then it adds up 100mb to VRAM usage, and it stacks it up over and over so when it starts its using about 3GB VRAM but it gets to 15GB pretty quickly , its definitely a major bug. .

What should have happened?

.

Commit where the problem happens

https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/7945

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

no

List of extensions

batch-face-swap | https://github.com/kex0/batch-face-swap.git | unknown custom-diffusion-webui | https://github.com/guaneec/custom-diffusion-webui.git | unknown embedding-inspector | https://github.com/tkalayci71/embedding-inspector.git | unknown sd-webui-additional-networks | https://github.com/kohya-ss/sd-webui-additional-networks.git | unknown sd-webui-controlnet | https://github.com/Mikubill/sd-webui-controlnet | unknown sd-webui-model-converter | https://github.com/Akegarasu/sd-webui-model-converter.git | unknown sd-webui-riffusion | https://github.com/enlyth/sd-webui-riffusion.git | unknown sd-webui-supermerger | https://github.com/hako-mikan/sd-webui-supermerger.git | unknown stable-diffusion-webui-fix-image-paste | https://github.com/klimaleksus/stable-diffusion-webui-fix-image-paste | unknown stable-diffusion-webui-fix-image-paste-master |   |   stable-diffusion-webui-instruct-pix2pix | https://github.com/Klace/stable-diffusion-webui-instruct-pix2pix | unknown stable-diffusion-webui-pixelization | https://github.com/AUTOMATIC1111/stable-diffusion-webui-pixelization.git | unknown ultimate-upscale-for-automatic1111 | https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git | unknown LDSR | built-in |   Lora | built-in |   ScuNET | built-in |   SwinIR | built-in |   prompt-bracket-checker | built-in |   roll-artist | built-in |  

Console logs

.

Additional information

No response

2blackbar avatar Mar 02 '23 18:03 2blackbar

--opt-split-attention-v1 the newer split attention states its supposed to seek out all the memory it can find, so you can play around with different settings

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

ClashSAN avatar Mar 02 '23 18:03 ClashSAN

this makes me able to get to step 20 on my amd windows install. without it I would go out of memory on step 2 with my 16GB 6900XT. however now during the final step I get

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 4096, 1, 512) (torch.float16) key : shape=(1, 4096, 1, 512) (torch.float16) value : shape=(1, 4096, 1, 512) (torch.float16) attn_bias : <class 'NoneType'> p : 0.0 cutlassF is not supported because: device=privateuseone (supported: {'cuda'}) flshattF is not supported because: device=privateuseone (supported: {'cuda'}) max(query.shape[-1] != value.shape[-1]) > 128 tritonflashattF is not supported because: device=privateuseone (supported: {'cuda'}) max(query.shape[-1] != value.shape[-1]) > 128 triton is not available smallkF is not supported because: device=privateuseone (supported: {'cpu', 'cuda'}) dtype=torch.float16 (supported: {torch.float32}) max(query.shape[-1] != value.shape[-1]) > 32 unsupported embed per head: 512

GitGudGandalf avatar Mar 03 '23 08:03 GitGudGandalf

that is different^

ClashSAN avatar Mar 03 '23 16:03 ClashSAN

I have a lot of memory issue as of late and I can't explain them either.

This is using img2img on large scale image (2560x1440) using 24gb VRAM (TITAN RTX). This used to work just fine even with only medvram and have plenty of images in my output directory, even using controlnet with a 1024/576 sized control image that worked 3 days ago.

The iterations are actually working, but it crashes at the end of the iterations for some reason while trying to allocate a huge amount of VRAM (15gb or so).

I tried using the --opt-split-attention-v1 option, also --lowvram without any change/success.

ntrouve-onera avatar Mar 03 '23 17:03 ntrouve-onera

When I reduce drastically the image size (to 1024 / 576) I still get out of memory issues.

Tried to delete venv/reinstall fully. No success again.

ntrouve-onera avatar Mar 03 '23 17:03 ntrouve-onera

False, 'Just Resize', False, True, 64, 64, 64, 1, 1, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, '

    \n
  • CFG Scale should be 2 or lower.
  • \n
\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, 'None', '

Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8

', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '

Will upscale the image by the selected scale factor; use width and height sliders to set tile size

', 64, 0, 2, 1, '', 0, '', 0, '', True, False, False, False, 0, None, None, 50, 0, 0, 512, 512, False, False, True, True, True, False, True, 1, False, False, 2.5, 4, 0, False, 0, 1, False, False, 'u2net', False, False, False, False, '

Will upscale the image depending on the selected target size type

', 512, 0, 8, 32, 64, 0.35, 32, 0, True, 0, False, 8, 0, 0, 2048, 2048, 2) {} Traceback (most recent call last): File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\call_queue.py", line 56, in f res = list(func(*args, **kwargs)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\call_queue.py", line 37, in f res = func(*args, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\img2img.py", line 171, in img2img processed = process_images(p) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\processing.py", line 486, in process_images res = process_images_inner(p) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\processing.py", line 632, in process_images_inner samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\processing.py", line 1048, in sample samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 322, in sample_img2img samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 225, in launch_sampling return func() File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 322, in samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 553, in sample_dpmpp_sde denoised = model(x, sigmas[i] * s_in, **extra_args) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 123, in forward x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [cond_in[a:b]], "c_concat": [image_cond_in[a:b]]}) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps return self.inner_model.apply_model(*args, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in call return self.__orig_func(*args, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, **cond) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward out = self.diffusion_model(x, t, context=cc) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 193, in forward2 return forward(*args, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 182, in forward h = module(h, emb, context) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 82, in forward x = layer(x, emb) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 249, in forward return checkpoint( File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint return CheckpointFunction.apply(func, len(inputs), *args) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 129, in forward output_tensors = ctx.run_function(*ctx.input_tensors) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 262, in _forward h = self.in_layers(x) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 204, in forward input = module(input) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\extensions-builtin\Lora\lora.py", line 182, in lora_Conv2d_forward return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input)) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "E:\Creation Jeu 2D\IA\stable\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.30 GiB (GPU 0; 24.00 GiB total capacity; 1.56 GiB already allocated; 18.64 GiB free; 3.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ntrouve-onera avatar Mar 03 '23 17:03 ntrouve-onera

Pressing generate again after a out of memory error and eventually it works without changing input...puzzling.

It seems to always error after a webui restart or a change of image resolution, pressing generate again can work.

ntrouve-onera avatar Mar 03 '23 19:03 ntrouve-onera

--opt-split-attention-v1 --xformers
This solved it, that split attention should be used by default !!

2blackbar avatar Mar 07 '23 12:03 2blackbar

--opt-split-attention-v1 --xformers

This also fixed it for me for Nvidia GTX1070 8GB Thanks

coudys avatar Mar 19 '23 11:03 coudys