[Bug]: torch.OutOfMemoryError
Checklist
- [x] The issue exists after disabling all extensions
- [x] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [x] The issue exists in the current version of the webui
- [x] The issue has not been reported before recently
- [ ] The issue has been reported before but has not been fixed yet
What happened?
I tried generating an image with the same setting as another image that i had already generated and it ran out of memory. Even if i remove the Hires, fix it still runs out of memory when generating a 832X1216 image. It doesn't seems to happen when its 512X512.
Steps to reproduce the problem
- Generate image higher than 512X512 (i tried with 832X1216 as its the same resolution as a previously generated image and 1024X1024)
- Generation stops at the last step where it stays for a while until it runs out of memory.
What should have happened?
It should generate images bigger than 512X512 without running out of memory.
What browsers do you use to access the UI ?
Mozilla Firefox
Sysinfo
Console logs
From https://github.com/lshqqytiger/stable-diffusion-webui-directml
* branch HEAD -> FETCH_HEAD
Already up to date.
venv "E:\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-43-g1ad6edf1
Commit hash: 1ad6edf170c2c4307e0d2400f760a149e621dc38
ROCm: agents=['gfx1100', 'gfx1036']
ROCm: version=6.2, using agent gfx1100
ZLUDA support: experimental
ZLUDA load: path='E:\stable-diffusion-webui-amdgpu\.zluda' nightly=False
Skipping onnxruntime installation.
W0903 13:31:26.367725 17452 venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --upcast-sampling --skip-ort
Loading weights [bdb59bac77] from E:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\waiNSFWIllustrious_v140.safetensors
Creating model from config: E:\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 22.2s (prepare environment: 33.3s, initialize shared: 0.9s, other imports: 0.4s, load scripts: 0.7s, create ui: 0.6s, gradio launch: 0.4s).
Applying attention optimization: Doggettx... done.
Model loaded in 20.6s (load weights from disk: 0.6s, create model: 0.8s, apply weights to model: 17.8s, load textual inversion embeddings: 0.2s, calculate empty prompt: 1.0s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:09<00:00, 2.16it/s]
Compilation is in progress. Please wait...████████████████▋ | 20/35 [00:10<00:05, 2.57it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [00:09<00:00, 1.67it/s]
*** Error completing request███████████████████████████████████████████████████████████| 35/35 [00:21<00:00, 1.57it/s]
*** Arguments: ('task(f5g1rvnfpa4juy7)', <gradio.routes.Request object at 0x0000028636D87AF0>, '8k,best quality,masterpiece,(ultra-detailed),(high detailed skin),((artist:reoen),(artist:fuzichoco),(artist:atdan),(artist:torino_aqua),year 2024:0.8),[artist:wlop],[artist:ningen_mame],artist:ciloranko,[[artist:rhasta]],artist:tidsean,colorful,<lora:c_chasca (genshin impact)_nbe11_xl:1>, chasca \\(genshin impact\\),pointy ears,red hair,hat,jewelry,blue eyes,braid,earrings,black headwear,red scarf, open mouth,smirk,naughty_face, detailed dreamy background, outdoors', 'anilingus,rimjob,huge ass,thick thighs,2boys,mosaic censoring,monochrome,ai-generated,lowres,(worst quality, bad quality:1.2),bad anatomy,sketch,jpeg artifacts,signature,watermark,aged down,chibi,censored,simple background,nose,nostrils,colorized,text,english text,shiny shirt,mismatched pupils,skinny,futanari,ugly,distorted,blurry,deformed,twisted,watermark,logo,text,username', [], 1, 1, 7, 1216, 832, True, 0.3, 1.2, 'Latent', 15, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'Euler a', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 74, in f
res = list(func(*args, **kwargs))
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 53, in f
res = func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 849, in process_images
res = process_images_inner(p)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1083, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1457, in sample
return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1556, in sample_hr_pass
decoded_samples = decode_latent_batch(self.sd_model, samples, target_device=devices.cpu, check_for_nans=True)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 633, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "E:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "E:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\diffusion.py", line 121, in decode_first_stage
out = self.first_stage_model.decode(z)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\autoencoder.py", line 315, in decode
dec = self.decoder(z, **decoder_kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 728, in forward
h = self.up[i_level].block[i_block](h, temb, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 132, in forward
h = self.conv1(h)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward
return originals.Conv2d_forward(self, input)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward
return F.conv2d(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.10 GiB. GPU 0 has a total capacity of 19.98 GiB of which 8.50 GiB is free. Of the allocated memory 7.45 GiB is allocated by PyTorch, and 3.63 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
---
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00, 2.69it/s]
*** Error completing request 2.06it/s]
*** Arguments: ('task(zl9veevrl90oy5p)', <gradio.routes.Request object at 0x000002866C3C2A10>, '8k,best quality,masterpiece,(ultra-detailed),(high detailed skin),((artist:reoen),(artist:fuzichoco),(artist:atdan),(artist:torino_aqua),year 2024:0.8),[artist:wlop],[artist:ningen_mame],artist:ciloranko,[[artist:rhasta]],artist:tidsean,colorful,<lora:c_chasca (genshin impact)_nbe11_xl:1>, chasca \\(genshin impact\\),pointy ears,red hair,hat,jewelry,blue eyes,braid,earrings,black headwear,red scarf, open mouth,smirk,naughty_face, detailed dreamy background, outdoors', 'anilingus,rimjob,huge ass,thick thighs,2boys,mosaic censoring,monochrome,ai-generated,lowres,(worst quality, bad quality:1.2),bad anatomy,sketch,jpeg artifacts,signature,watermark,aged down,chibi,censored,simple background,nose,nostrils,colorized,text,english text,shiny shirt,mismatched pupils,skinny,futanari,ugly,distorted,blurry,deformed,twisted,watermark,logo,text,username', [], 1, 1, 7, 1216, 832, False, 0.3, 1.2, 'Latent', 15, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'Euler a', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 74, in f
res = list(func(*args, **kwargs))
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 53, in f
res = func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 849, in process_images
res = process_images_inner(p)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1097, in process_images_inner
x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 633, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "E:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "E:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\diffusion.py", line 121, in decode_first_stage
out = self.first_stage_model.decode(z)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\autoencoder.py", line 315, in decode
dec = self.decoder(z, **decoder_kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 732, in forward
h = self.up[i_level].upsample(h)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 67, in forward
x = self.conv(x)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward
return originals.Conv2d_forward(self, input)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward
return F.conv2d(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.34 GiB. GPU 0 has a total capacity of 19.98 GiB of which 10.99 GiB is free. Of the allocated memory 7.68 GiB is allocated by PyTorch, and 927.77 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
---
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 10.59it/s]
Total progress: 75it [06:37, 5.30s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00, 2.58it/s]
*** Error completing request███████████████████████████████████████████████████████████| 20/20 [00:07<00:00, 2.48it/s]
*** Arguments: ('task(2y7ef7e0o709u5z)', <gradio.routes.Request object at 0x000002866C3C65C0>, '8k,best quality,masterpiece,(ultra-detailed),(high detailed skin),((artist:reoen),(artist:fuzichoco),(artist:atdan),(artist:torino_aqua),year 2024:0.8),[artist:wlop],[artist:ningen_mame],artist:ciloranko,[[artist:rhasta]],artist:tidsean,colorful,<lora:c_chasca (genshin impact)_nbe11_xl:1>, chasca \\(genshin impact\\),pointy ears,red hair,hat,jewelry,blue eyes,braid,earrings,black headwear,red scarf, open mouth,smirk,naughty_face, detailed dreamy background, outdoors', 'anilingus,rimjob,huge ass,thick thighs,2boys,mosaic censoring,monochrome,ai-generated,lowres,(worst quality, bad quality:1.2),bad anatomy,sketch,jpeg artifacts,signature,watermark,aged down,chibi,censored,simple background,nose,nostrils,colorized,text,english text,shiny shirt,mismatched pupils,skinny,futanari,ugly,distorted,blurry,deformed,twisted,watermark,logo,text,username', [], 1, 1, 7, 1024, 1024, False, 0.3, 1.2, 'Latent', 15, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'Euler a', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 74, in f
res = list(func(*args, **kwargs))
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 53, in f
res = func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 849, in process_images
res = process_images_inner(p)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1097, in process_images_inner
x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
File "E:\stable-diffusion-webui-amdgpu\modules\processing.py", line 633, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "E:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "E:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\diffusion.py", line 121, in decode_first_stage
out = self.first_stage_model.decode(z)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\autoencoder.py", line 315, in decode
dec = self.decoder(z, **decoder_kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 732, in forward
h = self.up[i_level].upsample(h)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 67, in forward
x = self.conv(x)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "E:\stable-diffusion-webui-amdgpu\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward
return originals.Conv2d_forward(self, input)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward
return F.conv2d(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB. GPU 0 has a total capacity of 19.98 GiB of which 10.87 GiB is free. Of the allocated memory 7.72 GiB is allocated by PyTorch, and 1013.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
---
Refresh index: 100% (371/371), done.
Additional information
I'm running a 7900XT with 20GB of vram witch i think should be plenty. Seems to be happening after the last webui update.
I think there might be a memory leak... Along the path C:\Users\User\AppData\Local\ZLUDA\ComputeCache a cache is stored. Mine is 113 MB (119,242,752 bytes) in size, and if it starts to grow beyond that value, I can get an error like the one above.
I've also noticed that the generation speed drops with each attempt to generate a new image as the zluda.db file increases in size.
Rocm 6.2, RX 6700 XT, 32 GB RAM.
I think there might be a memory leak... Along the path C:\Users\User\AppData\Local\ZLUDA\ComputeCache a cache is stored. Mine is 113 MB (119,242,752 bytes) in size, and if it starts to grow beyond that value, I can get an error like the one above.
I've also noticed that the generation speed drops with each attempt to generate a new image as the zluda.db file increases in size.
Rocm 6.2, RX 6700 XT, 32 GB RAM.
Well, mine is 628 MB (659.488.768 bytes) in size which seems like a lot. So if i delete the file every time it grows in size i may be able to generate images again?
I am having the same problem with my 6800. It is getting to 98% completed, then the task manager showed a massive spike in GPU memory (utilizing all of my dedicated memory and tried to use shared memory).
Edit: Just tried to do 832x1216 w/ hi-res upscale (goes to 1248x1824). It will shoot up from 10 GB to 29 GB at 98% completion (stayed below 15 throughout)
I think there might be a memory leak... Along the path C:\Users\User\AppData\Local\ZLUDA\ComputeCache a cache is stored. Mine is 113 MB (119,242,752 bytes) in size, and if it starts to grow beyond that value, I can get an error like the one above. I've also noticed that the generation speed drops with each attempt to generate a new image as the zluda.db file increases in size. Rocm 6.2, RX 6700 XT, 32 GB RAM. sysinfo-2025-09-08-18-07.json
Well, mine is 628 MB (659.488.768 bytes) in size which seems like a lot. So if i delete the file every time it grows in size i may be able to generate images again?
Every video card has its own ZLUDA cache size. If you constantly delete it, you will have to wait until it rebuilds itself. It's better to do this: delete zluda.db and perform your usual actions in the WebUI. Here is what I do:
- Start with any Stable Diffusion neural network. It doesn't matter if you write a prompt or not; the goal is to generate the first image.
- Once you have the image, use Hires. fix, then go to img2img, and use the SD upscale script. 3.Check the it/s (iterations per second) rate and make sure the WebUI has not crashed, then we copy the zluda.db file.
In other words, you need to perform your typical actions in the WebUI to get what you could call a 'clean' zluda.db file. And if problems begin, delete the 'working' zluda.db and replace it with the one you copied earlier. I also don't understand at what point the zluda.db file gets corrupted; I didn't notice this problem on ROCm 5.7, but it is no longer supported.
@EygenCat Where is this zluda.db file? I can't find it when I search for it with explorer. Do you mean delete the .zluda folder?
@maric193 The file zluda.db is located at C:\Users\User\AppData\Local\ZLUDA\ComputeCache, where User is your Windows username. Is it possible that you are running SD under a different user "account"
@maric193 The file zluda.db is located at C:\Users\User\AppData\Local\ZLUDA\ComputeCache, where User is your Windows username. Is it possible that you are running SD under a different user "account"
I found it. Was looking in the wrong place.
After trying this out it still gives an "out of memory" error (with a massive GPU spike). It even gave that error when resizing using the SD upscaler.
@maric193 The file zluda.db is located at C:\Users\User\AppData\Local\ZLUDA\ComputeCache, where User is your Windows username. Is it possible that you are running SD under a different user "account"
I found it. Was looking in the wrong place.
After trying this out it still gives an "out of memory" error (with a massive GPU spike). It even gave that error when resizing using the SD upscaler.
Are you sure you're using ZLUDA? Not directml? If ZLUDA really works. Does that mean one of the modules was not installed correctly? I don't know, you didn't provide any information.
@maric193 The file zluda.db is located at C:\Users\User\AppData\Local\ZLUDA\ComputeCache, where User is your Windows username. Is it possible that you are running SD under a different user "account"
I found it. Was looking in the wrong place. After trying this out it still gives an "out of memory" error (with a massive GPU spike). It even gave that error when resizing using the SD upscaler.
Are you sure you're using ZLUDA? Not directml? If ZLUDA really works. Does that mean one of the modules was not installed correctly? I don't know, you didn't provide any information.
Unless this program overrides the command below, or there is some setting that was not mentioned in the installation, I am pretty sure I am using ZLUDA.
set COMMANDLINE_ARGS= --use-zluda --update-check --skip-ort --skip-python-version-check
The high vram usage comes from the vae step after the highres fix. To prevent this you can do a few things. Get the FP16 sdxl vae file if you use sdxl/pony/illustrious based models. Set the Hires Steps to 10-15 and not higher. Download and install the Tiled Diffusion & Tile VAE Extension from here: https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111 When installed relaunch Auto1111 and when you use Hires Fix always and only enable the Tiled VAE option at the bottom. Ignore the other settings from tiled diffusion etc. Only Tiled VAE will help.
I've tried again with a clean install and just by selecting the model the error happened again. All i've done is install the UI and copy the "waiIllustriousSDXL_v150.safetensors" checkpoint, then i selected it and the error happened freezing the computer for some seconds. Logs:
venv "H:\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-44-g49557ff6
Commit hash: 49557ff60fac408dce8e34a3be8ce9870e5747f0
ROCm: agents=['gfx1100', 'gfx1036']
ROCm: version=6.2, using agent gfx1100
ZLUDA support: experimental
ZLUDA load: path='H:\stable-diffusion-webui-amdgpu.zluda' nightly=False
W1018 14:20:59.763420 24652 venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
H:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers_init_.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
H:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead.
rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.23.1 provider=CPUExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
Loading weights [6ce0161689] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
Startup time: 8.7s (prepare environment: 10.8s, initialize shared: 0.9s, load scripts: 0.3s, create ui: 0.3s, gradio launch: 0.3s).
Applying attention optimization: Doggettx... done.
Model loaded in 7.2s (load weights from disk: 0.2s, create model: 0.7s, apply weights to model: 5.5s, load textual inversion embeddings: 0.2s, calculate empty prompt: 0.6s).
Reusing loaded model v1-5-pruned-emaonly.safetensors [6ce0161689] to load waiIllustriousSDXL_v150.safetensors [befc694a29]
Loading weights [befc694a29] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\waiIllustriousSDXL_v150.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Compilation is in progress. Please wait...
changing setting sd_model_checkpoint to waiIllustriousSDXL_v150.safetensors [befc694a29]: OutOfMemoryError
Traceback (most recent call last):
File "H:\stable-diffusion-webui-amdgpu\modules\options.py", line 165, in set
option.onchange()
File "H:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 14, in f
res = func(*args, **kwargs)
File "H:\stable-diffusion-webui-amdgpu\modules\initialize_util.py", line 181, in
Make sure your on the latest AMD Adrenalin Driver (25.9.1) Then I see youhave HIP SDK 6.2 installed but the latest Version is 6.4 So uninstall everything from HIP SDK then install HIP SDK 6.4 and then reboot the PC. Also uninstall Python 3.10.6 and install Python 3.11.9 64bit for a better support of modern webuis and tools. Then delete the venv and .zluda folder from the stable-diffusion-webui-amdgpu folder. Then make sure you use these launch args: --use-zluda --update-check --skip-ort --skip-python-version-check in the webui-user.bat Finally relaunch the webui-user.bat
Also make sure your not running Wallpaper Engine in the background. It can cause massive issues while generating images. ALso how much System RAM do you have? If 16gb then you need to increase the Windows Pagefile.
I uninstalled HIP SDK 6.2 and installed HIP SDK 6.4 also uninstalled Python 3.10.6 and installed Python 3.11.9. I also did a clean install and used --use-zluda --update-check --skip-ort --skip-python-version-check. And i uninstalled my GPU drivers using DDU and installed the newer version.
After all that i started web ui and tried to use waiIllustriousSDXL_v150.safetensors again, this time i could set it as checkpoint and generate some images but as soon as i used Hires fix it ran out of memory again. I have 32GB of DDR5 ram and a 7900XT with 20GB of vram. This is the generation parameters i used:
And these are the logs:
venv "H:\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
Version: v1.10.1-amd-44-g49557ff6
Commit hash: 49557ff60fac408dce8e34a3be8ce9870e5747f0
ROCm: agents=['gfx1100', 'gfx1036']
ROCm: version=6.4, using agent gfx1100
ZLUDA support: experimental
ZLUDA load: path='H:\stable-diffusion-webui-amdgpu.zluda' nightly=False
Skipping onnxruntime installation.
You are up to date with the most recent release.
W1020 22:31:21.691000 8232 venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\timm\models\layers_init_.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead.
rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --update-check --skip-ort --skip-python-version-check
Loading weights [befc694a29] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\waiIllustriousSDXL_v150.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
Startup time: 27.4s (prepare environment: 44.4s, initialize shared: 0.7s, other imports: 0.6s, load scripts: 0.4s, create ui: 0.6s, gradio launch: 0.3s).
Loading VAE weights specified in settings: H:\stable-diffusion-webui-amdgpu\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
Applying attention optimization: Doggettx... done.
Model loaded in 12.0s (load weights from disk: 0.6s, create model: 1.1s, apply weights to model: 8.8s, load VAE: 0.4s, move model to device: 0.1s, load textual inversion embeddings: 0.2s, calculate empty prompt: 0.8s).
Reusing loaded model waiIllustriousSDXL_v150.safetensors [befc694a29] to load v1-5-pruned-emaonly.safetensors [6ce0161689]
Loading weights [6ce0161689] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
Loading VAE weights specified in settings: H:\stable-diffusion-webui-amdgpu\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
Applying attention optimization: Doggettx... done.
Model loaded in 6.1s (create model: 0.5s, apply weights to model: 5.4s).
Reusing loaded model v1-5-pruned-emaonly.safetensors [6ce0161689] to load waiIllustriousSDXL_v150.safetensors [befc694a29]
Loading weights [befc694a29] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\waiIllustriousSDXL_v150.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Loading VAE weights specified in settings: H:\stable-diffusion-webui-amdgpu\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
Applying attention optimization: Doggettx... done.
Model loaded in 4.0s (create model: 0.4s, apply weights to model: 3.2s).
Reusing loaded model waiIllustriousSDXL_v150.safetensors [befc694a29] to load v1-5-pruned-emaonly.safetensors [6ce0161689]
Loading weights [6ce0161689] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
Loading VAE weights specified in settings: H:\stable-diffusion-webui-amdgpu\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
Applying attention optimization: Doggettx... done.
Model loaded in 1.9s (create model: 0.3s, apply weights to model: 1.4s).
Reusing loaded model v1-5-pruned-emaonly.safetensors [6ce0161689] to load waiIllustriousSDXL_v150.safetensors [befc694a29]
Loading weights [befc694a29] from H:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\waiIllustriousSDXL_v150.safetensors
Creating model from config: H:\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Loading VAE weights specified in settings: H:\stable-diffusion-webui-amdgpu\models\VAE\fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
Applying attention optimization: Doggettx... done.
Model loaded in 4.1s (create model: 0.4s, apply weights to model: 3.4s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 8.56it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 8.93it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:01<00:00, 10.24it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 9.66it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 5.71it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 5.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00, 2.60it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:08<00:00, 2.28it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00, 2.60it/s]
Downloading: "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth" to H:\stable-diffusion-webui-amdgpu\models\RealESRGAN\RealESRGAN_x4plus_anime_6B.pth
100%|█████████████████████████████████████████████████████████████████████████████| 17.1M/17.1M [00:00<00:00, 60.6MB/s] tiled upscale: 100%|███████████████████████████████████████████████████████████████████| 36/36 [00:02<00:00, 16.55it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00, 1.54it/s] *** Error completing request███████████████████████████████████████████████████████████| 40/40 [00:25<00:00, 1.48it/s] *** Arguments: ('task(lene1f7v294mdy8)', <gradio.routes.Request object at 0x000001BD0D08C690>, '', '', [], 1, 1, 7, 1024, 1024, True, 0.3, 1.2, 'R-ESRGAN 4x+ Anime6B', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'Euler a', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {} Traceback (most recent call last): File "H:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 74, in f res = list(func(*args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 53, in f res = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 37, in f res = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img processed = processing.process_images(p) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\processing.py", line 849, in process_images res = process_images_inner(p) ^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1083, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1457, in sample return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\processing.py", line 1556, in sample_hr_pass decoded_samples = decode_latent_batch(self.sd_model, samples, target_device=devices.cpu, check_for_nans=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\processing.py", line 633, in decode_latent_batch sample = decode_first_stage(model, batch[i:i + 1])[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 76, in decode_first_stage return samples_to_images_tensor(x, approx_index, model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\diffusion.py", line 121, in decode_first_stage out = self.first_stage_model.decode(z) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\autoencoder.py", line 315, in decode dec = self.decoder(z, **decoder_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 728, in forward h = self.up[i_level].block[i_block](h, temb, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 132, in forward h = self.conv1(h) ^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward return originals.Conv2d_forward(self, input) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\conv.py", line 554, in forward return self._conv_forward(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "H:\stable-diffusion-webui-amdgpu\venv\Lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward return F.conv2d( ^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.43 GiB. GPU 0 has a total capacity of 19.98 GiB of which 4.00 GiB is free. Of the allocated memory 8.56 GiB is allocated by PyTorch, and 7.04 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)