stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: Can't generate at usual resolutions since yesterday. Same memory issues on both medram and lowvram.
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
Title is self explanatory + the console log I've provided, I do not know why but high res does not want to work under the exact same conditions as it has before, it always OOM errors at the 99%/on the last step, I have not gotten OOMs before this but it has started to happen since after all these new commits starting with commit ac38ad7e60bb0ff3194536a72dd1259edad0b30a
Steps to reproduce the problem
- Go to txt2img
- Enable high res
- Set to 1080x1920
- Get error'd
What should have happened?
Should have generated just like it had numerous times before, instead now it does not want to work.
Commit where the problem happens
dfeee786f903e392dbef1519c7c246b9856ebab3
What platforms do you use to access the UI ?
Windows
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
I've tried all combinations of command line args (lowvram included) and the issue persists, it sometimes works once or twice seemingly at random depending on command line args but then after working that one time or two it would on the same settings fail and persist until webui restart.
The console log I've provided uses:
set COMMANDLINE_ARGS=--theme dark --medvram --xformers --force-enable-xformers --always-batch-cond-uncond --opt-channelslast --no-hashing --disable-nan-check --api
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:384
set ATTN_PRECISION=fp16
set SAFETENSORS_FAST_GPU=1
List of extensions
Console logs
Traceback (most recent call last):
File "C:\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "C:\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "C:\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
processed = process_images(p)
File "C:\stable-diffusion-webui\modules\processing.py", line 486, in process_images
res = process_images_inner(p)
File "C:\stable-diffusion-webui\modules\processing.py", line 637, in process_images_inner
x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
File "C:\stable-diffusion-webui\modules\processing.py", line 637, in <listcomp>
x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
File "C:\stable-diffusion-webui\modules\processing.py", line 423, in decode_first_stage
x = model.decode_first_stage(x)
File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 26, in __call__
return self.__sub_func(self.__orig_func, *args, **kwargs)
File "C:\stable-diffusion-webui\modules\sd_hijack_unet.py", line 76, in <lambda>
first_stage_sub = lambda orig_func, self, x, **kwargs: orig_func(self, x.to(devices.dtype_vae), **kwargs)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
return self.first_stage_model.decode(z)
File "C:\stable-diffusion-webui\modules\lowvram.py", line 52, in first_stage_model_decode_wrap
return first_stage_model_decode(z)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
dec = self.decoder(z)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
h = self.up[i_level].block[i_block](h, temb)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 133, in forward
h = self.conv1(h)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\extensions\stable-diffusion-webui-composable-lora\composable_lora.py", line 154, in lora_Conv2d_forward
return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input))
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 6.00 GiB total capacity; 4.79 GiB already allocated; 0 bytes free; 5.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Additional information
Tried every command line before making this bug report, it is highly likely not the parameters I've used as it is acting the same on every combination.
6GB was enough to do 1080x1920 with a very little amount of headroom left via hi-res, now it OOMs and says it needs anywhere between couple hundreds of MB to 2GB.
Additionally, I have not changed anything outside of auto1111, it is not windows related or driver related.
I tried again today in an attempt to confirm wether I am right or not, yep, it continues to do it once again: --medvram and --lowvram function the same in terms of memory usage, and both ask for 1gb-2gb of additional vram, lowvram still makes it slow as hell though.
Before I could do 1080p and be at about 85 or so VRAM usage, now I can go and use --lowvram and still get OOM'd and have it ask or an additional few GBs.
The only solution while this gets fixed has so far been to drop down to 720p, and even that I only can do at about 90+ VRAM usage which is an insane increase compared to before.
I've gave up on trying to fix it but now that I am doing 712x512 it's giving this. Auto1111 is unusable for me so I'll wait until this is fixed. It seems that the generations are somewhat broken too for some reason as the couple times I've managed to get it to generate it gave pretty weird and different results than the normal.
File "C:\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "C:\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "C:\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
processed = process_images(p)
File "C:\stable-diffusion-webui\modules\processing.py", line 486, in process_images
res = process_images_inner(p)
File "C:\stable-diffusion-webui\modules\processing.py", line 635, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "C:\stable-diffusion-webui\modules\processing.py", line 835, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 351, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 227, in launch_sampling
return func()
File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 351, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 138, in forward
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": c_crossattn, "c_concat": [image_cond_in[a:b]]})
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
return self.__orig_func(*args, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl
result = forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 233, in forward2
return forward(*args, **kwargs)
File "C:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 203, in forward
h = module(h, emb, context)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
x = layer(x, context)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 324, in forward
x = block(x, context=context[i])
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\extensions\Hypernetwork-MonkeyPatch-Extension\patches\external_pr\sd_hijack_checkpoint.py", line 5, in BasicTransformerBlock_forward
return checkpoint(self._forward, x, context)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 264, in _forward
x = self.ff(self.norm3(x)) + x
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 73, in forward
return self.net(x)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 204, in forward
input = module(input)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 52, in forward
x, gate = self.proj(x).chunk(2, dim=-1)
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\stable-diffusion-webui\extensions\stable-diffusion-webui-composable-lora\composable_lora.py", line 150, in lora_Linear_forward
return lora_forward(self, input, torch.nn.Linear_forward_before_lora(self, input))
File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```
I had strange VRAM issues recently (#8431), which were mitigated by using v1 optimisation --opt-split-attention-v1
(#8409)
Do you still have issues when you use this?
I had strange VRAM issues recently (#8431), which were mitigated by using v1 optimisation
--opt-split-attention-v1
(#8409)Do you still have issues when you use this?
yep already tried that, nothing seems to do anything, only thing that did was --skip-torch-cuda-test but that did not fix it at all, only lowered the OOM from requiring 1014mb to about half at 514mb, but it still OOMs the same
currently coping with tiled VAE (included with multidiffusion extension) which offsets these memory issues and allows me to generate at 1080p, hopefully this is fixed soon
I have same error when I try to train dreambooth (I have 24gb of VRAM)
Same on 6gb gtx 1660 ti
Same on 2070 super 8gb. Cannot run hires fix without xformers on. I'm on
python: 3.10.6 • torch: 2.1.0.dev20230317+cu118 • xformers: N/A • gradio: 3.16.2 • commit: [a9fed7c3]
Tried running with
--medvram --no-half-vae --opt-sdp-no-mem-attention
and with
--opt-split-attention-v1
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512
Closing as stale.