stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: img2img after image generated and saved, instead of finishing, throwing exception: "CUDA out of memory" even with batch count 1, possibly a memory leak?
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
My setup: RTX 2080 with Linux. When using img2img with resolution 2048x1024, Sampling method: DPM++ 2M Karras, steps: 40, batch size: 1, batch count: 1 or more. First image is generated and saved fine. Then instead of just finishing, the code crashes with exception, even when the batch count is. Exception:
Traceback (most recent call last):
File "/home/username/stable-diffusion-webui/modules/ui.py", line 185, in f
res = list(func(*args, **kwargs))
File "/home/username/stable-diffusion-webui/webui.py", line 54, in f
res = func(*args, **kwargs)
File "/home/username/stable-diffusion-webui/modules/img2img.py", line 139, in img2img
processed = process_images(p)
File "/home/username/stable-diffusion-webui/modules/processing.py", line 423, in process_images
res = process_images_inner(p)
File "/home/username/stable-diffusion-webui/modules/processing.py", line 572, in process_images_inner
state.nextjob()
File "/home/username/stable-diffusion-webui/modules/shared.py", line 155, in nextjob
self.do_set_current_image()
File "/home/username/stable-diffusion-webui/modules/shared.py", line 209, in do_set_current_image
self.current_image = sd_samplers.sample_to_image(self.current_latent)
File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 101, in sample_to_image
return single_sample_to_image(samples[index])
File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 93, in single_sample_to_image
x_sample = processing.decode_first_stage(shared.sd_model, sample.unsqueeze(0))[0]
File "/home/username/stable-diffusion-webui/modules/processing.py", line 363, in decode_first_stage
x = model.decode_first_stage(x)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 763, in decode_first_stage
return self.first_stage_model.decode(z)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/autoencoder.py", line 332, in decode
dec = self.decoder(z)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 553, in forward
h = self.up[i_level].block[i_block](h, temb)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 123, in forward
h = self.norm1(h)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 272, in forward
return F.group_norm(
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2516, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 5.04 GiB already allocated; 1.63 GiB free; 5.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Very strange that it crashes after the image already generated and saved. I would guess that there is a memory leak, and memory not properly released. Obviously it prevents to run batches with 2+ count.
Steps to reproduce the problem
Using setup: RTX 2080 with Linux. Run img2img with resolution 2048x1024, Sampling method: DPM++ 2M Karras, steps: 40, batch size: 1, batch count: 1 or more.
What should have happened?
After generation of first image, memory used by it should be released.
Commit where the problem happens
ac085628540d0ec6a988fad93f5b8f2154209571
What platforms do you use to access UI ?
Linux
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
--listen
Additional information, context and logs
Traceback (most recent call last):
File "/home/username/stable-diffusion-webui/modules/ui.py", line 185, in f
res = list(func(*args, **kwargs))
File "/home/username/stable-diffusion-webui/webui.py", line 54, in f
res = func(*args, **kwargs)
File "/home/username/stable-diffusion-webui/modules/img2img.py", line 139, in img2img
processed = process_images(p)
File "/home/username/stable-diffusion-webui/modules/processing.py", line 423, in process_images
res = process_images_inner(p)
File "/home/username/stable-diffusion-webui/modules/processing.py", line 572, in process_images_inner
state.nextjob()
File "/home/username/stable-diffusion-webui/modules/shared.py", line 155, in nextjob
self.do_set_current_image()
File "/home/username/stable-diffusion-webui/modules/shared.py", line 209, in do_set_current_image
self.current_image = sd_samplers.sample_to_image(self.current_latent)
File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 101, in sample_to_image
return single_sample_to_image(samples[index])
File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 93, in single_sample_to_image
x_sample = processing.decode_first_stage(shared.sd_model, sample.unsqueeze(0))[0]
File "/home/username/stable-diffusion-webui/modules/processing.py", line 363, in decode_first_stage
x = model.decode_first_stage(x)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 763, in decode_first_stage
return self.first_stage_model.decode(z)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/autoencoder.py", line 332, in decode
dec = self.decoder(z)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 553, in forward
h = self.up[i_level].block[i_block](h, temb)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 123, in forward
h = self.norm1(h)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 272, in forward
return F.group_norm(
File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2516, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 5.04 GiB already allocated; 1.63 GiB free; 5.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I noticed a similar problem yesterday too. All steps are done but preview error after last step. Can't reproduce it ATM, sorry.
Possibly important detail, it happened when the computer wasn't connected to a monitor, I connected in remote. So the graphical card had more resources to work. When the graphical card actively working with monitor, it usually starts throwing exceptions at lower img2img resolutions, particularly 896x1792px, and it happens at the first step.
My GPU was reduced 8GB to 2489MB lol I can't even generate any art
@Thakshara9728 Please publish the settings that you are using. Most of the time making image resolution smaller, demands less resources for image generation, so it works. But in this particular case, the exception is happening after image has been already generated and saved to disk.
same problem