stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: Can't generate at usual resolutions since yesterday. Same memory issues on both medram and lowvram.

Open vidiotgameboss opened this issue 1 year ago • 7 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

Title is self explanatory + the console log I've provided, I do not know why but high res does not want to work under the exact same conditions as it has before, it always OOM errors at the 99%/on the last step, I have not gotten OOMs before this but it has started to happen since after all these new commits starting with commit ac38ad7e60bb0ff3194536a72dd1259edad0b30a

Steps to reproduce the problem

  1. Go to txt2img
  2. Enable high res
  3. Set to 1080x1920
  4. Get error'd

What should have happened?

Should have generated just like it had numerous times before, instead now it does not want to work.

Commit where the problem happens

dfeee786f903e392dbef1519c7c246b9856ebab3

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

I've tried all combinations of command line args (lowvram included) and the issue persists, it sometimes works once or twice seemingly at random depending on command line args but then after working that one time or two it would on the same settings fail and persist until webui restart.

The console log I've provided uses:

set COMMANDLINE_ARGS=--theme dark --medvram --xformers --force-enable-xformers --always-batch-cond-uncond --opt-channelslast --no-hashing --disable-nan-check --api
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:384
set ATTN_PRECISION=fp16
set SAFETENSORS_FAST_GPU=1

List of extensions

1 2 3

Console logs

Traceback (most recent call last):
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "C:\stable-diffusion-webui\modules\processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "C:\stable-diffusion-webui\modules\processing.py", line 637, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "C:\stable-diffusion-webui\modules\processing.py", line 637, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "C:\stable-diffusion-webui\modules\processing.py", line 423, in decode_first_stage
    x = model.decode_first_stage(x)
  File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 26, in __call__
    return self.__sub_func(self.__orig_func, *args, **kwargs)
  File "C:\stable-diffusion-webui\modules\sd_hijack_unet.py", line 76, in <lambda>
    first_stage_sub = lambda orig_func, self, x, **kwargs: orig_func(self, x.to(devices.dtype_vae), **kwargs)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "C:\stable-diffusion-webui\modules\lowvram.py", line 52, in first_stage_model_decode_wrap
    return first_stage_model_decode(z)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 133, in forward
    h = self.conv1(h)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\extensions\stable-diffusion-webui-composable-lora\composable_lora.py", line 154, in lora_Conv2d_forward
    return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input))
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 6.00 GiB total capacity; 4.79 GiB already allocated; 0 bytes free; 5.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

Tried every command line before making this bug report, it is highly likely not the parameters I've used as it is acting the same on every combination.

6GB was enough to do 1080x1920 with a very little amount of headroom left via hi-res, now it OOMs and says it needs anywhere between couple hundreds of MB to 2GB.

Additionally, I have not changed anything outside of auto1111, it is not windows related or driver related.

vidiotgameboss avatar Mar 13 '23 01:03 vidiotgameboss

I tried again today in an attempt to confirm wether I am right or not, yep, it continues to do it once again: --medvram and --lowvram function the same in terms of memory usage, and both ask for 1gb-2gb of additional vram, lowvram still makes it slow as hell though.

Before I could do 1080p and be at about 85 or so VRAM usage, now I can go and use --lowvram and still get OOM'd and have it ask or an additional few GBs.

The only solution while this gets fixed has so far been to drop down to 720p, and even that I only can do at about 90+ VRAM usage which is an insane increase compared to before.

vidiotgameboss avatar Mar 13 '23 19:03 vidiotgameboss

I've gave up on trying to fix it but now that I am doing 712x512 it's giving this. Auto1111 is unusable for me so I'll wait until this is fixed. It seems that the generations are somewhat broken too for some reason as the couple times I've managed to get it to generate it gave pretty weird and different results than the normal.

  File "C:\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "C:\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "C:\stable-diffusion-webui\modules\processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "C:\stable-diffusion-webui\modules\processing.py", line 635, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "C:\stable-diffusion-webui\modules\processing.py", line 835, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 351, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 227, in launch_sampling
    return func()
  File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 351, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 138, in forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": c_crossattn, "c_concat": [image_cond_in[a:b]]})
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 233, in forward2
    return forward(*args, **kwargs)
  File "C:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 203, in forward
    h = module(h, emb, context)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 324, in forward
    x = block(x, context=context[i])
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\extensions\Hypernetwork-MonkeyPatch-Extension\patches\external_pr\sd_hijack_checkpoint.py", line 5, in BasicTransformerBlock_forward
    return checkpoint(self._forward, x, context)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 264, in _forward
    x = self.ff(self.norm3(x)) + x
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 73, in forward
    return self.net(x)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 204, in forward
    input = module(input)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 52, in forward
    x, gate = self.proj(x).chunk(2, dim=-1)
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\stable-diffusion-webui\extensions\stable-diffusion-webui-composable-lora\composable_lora.py", line 150, in lora_Linear_forward
    return lora_forward(self, input, torch.nn.Linear_forward_before_lora(self, input))
  File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```

vidiotgameboss avatar Mar 13 '23 22:03 vidiotgameboss

I had strange VRAM issues recently (#8431), which were mitigated by using v1 optimisation --opt-split-attention-v1 (#8409)

Do you still have issues when you use this?

Xyem avatar Mar 16 '23 14:03 Xyem

I had strange VRAM issues recently (#8431), which were mitigated by using v1 optimisation --opt-split-attention-v1 (#8409)

Do you still have issues when you use this?

yep already tried that, nothing seems to do anything, only thing that did was --skip-torch-cuda-test but that did not fix it at all, only lowered the OOM from requiring 1014mb to about half at 514mb, but it still OOMs the same

currently coping with tiled VAE (included with multidiffusion extension) which offsets these memory issues and allows me to generate at 1080p, hopefully this is fixed soon

vidiotgameboss avatar Mar 16 '23 21:03 vidiotgameboss

I have same error when I try to train dreambooth (I have 24gb of VRAM)

arshtepe avatar Mar 18 '23 13:03 arshtepe

Same on 6gb gtx 1660 ti

AmusedDiffuser avatar Mar 19 '23 16:03 AmusedDiffuser

Same on 2070 super 8gb. Cannot run hires fix without xformers on. I'm on python: 3.10.6  •  torch: 2.1.0.dev20230317+cu118  •  xformers: N/A  •  gradio: 3.16.2  •  commit: [a9fed7c3] Tried running with --medvram --no-half-vae --opt-sdp-no-mem-attention and with --opt-split-attention-v1 set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

sashasubbbb avatar Mar 20 '23 10:03 sashasubbbb

Closing as stale.

catboxanon avatar Aug 03 '23 16:08 catboxanon