stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

txt2img GPU memory leak for batches -> crash after 5 hours with 22 GB consumed by PyTorch

Open cmp-nct opened this issue 3 years ago • 1 comments

Describe the bug I started a 2000 image batch production and it crashed quarter way through, there is a GPU memory loss

  • highres fix
  • euler a
  • 1024x768

After 5 hours I had a CUDA crash for memory consumption. 22.31 GiB reserved in total by PyTorch Error completing request[5:56:52, 3.65it/s]

Traceback (most recent call last):
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\ui.py", line 191, in f
    res = list(func(*args, **kwargs))
  File "D:\Stable-Dif-Local\stable-diffusion-webui\webui.py", line 64, in f
    res = func(*args, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\txt2img.py", line 34, in txt2img
    processed = process_images(p)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\processing.py", line 375, in process_images
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\processing.py", line 546, in sample
    samples = self.sampler.sample_img2img(self, samples, noise, conditioning, unconditional_conditioning, steps=self.steps)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\sd_samplers.py", line 323, in sample_img2img
    return self.func(self.model_wrap_cfg, xi, sigma_sched, extra_args={'cond': conditioning, 'uncond': unconditional_conditioning, 'cond_scale': p.cfg_scale}, disable=False, callback=self.callback_state, **extra_params_kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 78, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\sd_samplers.py", line 195, in forward
    uncond, cond = self.inner_model(x_in, sigma_in, cond=cond_in).chunk(2)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 100, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 126, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 987, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 1410, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 732, in forward
    h = module(h, emb, context)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 85, in forward
    x = layer(x, context)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 258, in forward
    x = block(x, context=context)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 209, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 127, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 212, in _forward
    x = self.attn1(self.norm1(x)) + x
  File "D:\Stable-Dif-Local\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Stable-Dif-Local\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 82, in split_cross_attention_forward
    s2 = s1.softmax(dim=-1, dtype=q.dtype)
RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 24.00 GiB total capacity; 20.72 GiB already allocated; 0 bytes free; 22.31 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Desktop (please complete the following information): Windows, 61ba3af01ce4ea3f4b3cbeb4181070b8fa3b7044

P.S. line numbers in processing.py are off because I added a few lines for verbose outputs (nothing with GPU/Torch)

cmp-nct avatar Oct 05 '22 12:10 cmp-nct

On macOS, I get this when quitting the web ui (Ctrl + c on the console), which could be related:

~/opt/miniconda3/envs/web-ui/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

pravindahal avatar Oct 29 '22 03:10 pravindahal

Closing as stale.

catboxanon avatar Aug 03 '23 18:08 catboxanon