stable-diffusion-webui [Bug]: img2img after image generated and saved, instead of finishing, throwing exception: "CUDA out of memory" even with batch count 1, possibly a memory leak?

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

My setup: RTX 2080 with Linux. When using img2img with resolution 2048x1024, Sampling method: DPM++ 2M Karras, steps: 40, batch size: 1, batch count: 1 or more. First image is generated and saved fine. Then instead of just finishing, the code crashes with exception, even when the batch count is. Exception:

Traceback (most recent call last):
  File "/home/username/stable-diffusion-webui/modules/ui.py", line 185, in f
    res = list(func(*args, **kwargs))
  File "/home/username/stable-diffusion-webui/webui.py", line 54, in f
    res = func(*args, **kwargs)
  File "/home/username/stable-diffusion-webui/modules/img2img.py", line 139, in img2img
    processed = process_images(p)
  File "/home/username/stable-diffusion-webui/modules/processing.py", line 423, in process_images
    res = process_images_inner(p)
  File "/home/username/stable-diffusion-webui/modules/processing.py", line 572, in process_images_inner
    state.nextjob()
  File "/home/username/stable-diffusion-webui/modules/shared.py", line 155, in nextjob
    self.do_set_current_image()
  File "/home/username/stable-diffusion-webui/modules/shared.py", line 209, in do_set_current_image
    self.current_image = sd_samplers.sample_to_image(self.current_latent)
  File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 101, in sample_to_image
    return single_sample_to_image(samples[index])
  File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 93, in single_sample_to_image
    x_sample = processing.decode_first_stage(shared.sd_model, sample.unsqueeze(0))[0]
  File "/home/username/stable-diffusion-webui/modules/processing.py", line 363, in decode_first_stage
    x = model.decode_first_stage(x)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 763, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/autoencoder.py", line 332, in decode
    dec = self.decoder(z)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 553, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 123, in forward
    h = self.norm1(h)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 272, in forward
    return F.group_norm(
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 5.04 GiB already allocated; 1.63 GiB free; 5.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Very strange that it crashes after the image already generated and saved. I would guess that there is a memory leak, and memory not properly released. Obviously it prevents to run batches with 2+ count.

Steps to reproduce the problem

Using setup: RTX 2080 with Linux. Run img2img with resolution 2048x1024, Sampling method: DPM++ 2M Karras, steps: 40, batch size: 1, batch count: 1 or more.

What should have happened?

After generation of first image, memory used by it should be released.

Commit where the problem happens

ac085628540d0ec6a988fad93f5b8f2154209571

What platforms do you use to access UI ?

Linux

What browsers do you use to access the UI ?

Google Chrome

Command Line Arguments

--listen

Additional information, context and logs

Traceback (most recent call last):
  File "/home/username/stable-diffusion-webui/modules/ui.py", line 185, in f
    res = list(func(*args, **kwargs))
  File "/home/username/stable-diffusion-webui/webui.py", line 54, in f
    res = func(*args, **kwargs)
  File "/home/username/stable-diffusion-webui/modules/img2img.py", line 139, in img2img
    processed = process_images(p)
  File "/home/username/stable-diffusion-webui/modules/processing.py", line 423, in process_images
    res = process_images_inner(p)
  File "/home/username/stable-diffusion-webui/modules/processing.py", line 572, in process_images_inner
    state.nextjob()
  File "/home/username/stable-diffusion-webui/modules/shared.py", line 155, in nextjob
    self.do_set_current_image()
  File "/home/username/stable-diffusion-webui/modules/shared.py", line 209, in do_set_current_image
    self.current_image = sd_samplers.sample_to_image(self.current_latent)
  File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 101, in sample_to_image
    return single_sample_to_image(samples[index])
  File "/home/username/stable-diffusion-webui/modules/sd_samplers.py", line 93, in single_sample_to_image
    x_sample = processing.decode_first_stage(shared.sd_model, sample.unsqueeze(0))[0]
  File "/home/username/stable-diffusion-webui/modules/processing.py", line 363, in decode_first_stage
    x = model.decode_first_stage(x)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddpm.py", line 763, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/models/autoencoder.py", line 332, in decode
    dec = self.decoder(z)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 553, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/username/stable-diffusion-webui/repositories/stable-diffusion/ldm/modules/diffusionmodules/model.py", line 123, in forward
    h = self.norm1(h)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 272, in forward
    return F.group_norm(
  File "/home/username/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 7.79 GiB total capacity; 5.04 GiB already allocated; 1.63 GiB free; 5.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Nov 08 '22 22:11 a-l-e-x-d-s-9

I noticed a similar problem yesterday too. All steps are done but preview error after last step. Can't reproduce it ATM, sorry.

Nov 08 '22 22:11 Ehplodor

Possibly important detail, it happened when the computer wasn't connected to a monitor, I connected in remote. So the graphical card had more resources to work. When the graphical card actively working with monitor, it usually starts throwing exceptions at lower img2img resolutions, particularly 896x1792px, and it happens at the first step.

Nov 09 '22 07:11 a-l-e-x-d-s-9

My GPU was reduced 8GB to 2489MB lol I can't even generate any art

Nov 11 '22 07:11 Thakshara9728

@Thakshara9728 Please publish the settings that you are using. Most of the time making image resolution smaller, demands less resources for image generation, so it works. But in this particular case, the exception is happening after image has been already generated and saved to disk.

Nov 11 '22 08:11 a-l-e-x-d-s-9

same problem

Jul 14 '23 02:07 Kaiwooo

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Bug]: img2img after image generated and saved, instead of finishing, throwing exception: "CUDA out of memory" even with batch count 1, possibly a memory leak?

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access UI ?

What browsers do you use to access the UI ?

Command Line Arguments

Additional information, context and logs

stable-diffusion-webui
stable-diffusion-webui copied to clipboard