stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED and CUDA error: the launch timed out
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
All monitors freeze momentarily, mouse as well, sometimes dip black and come back, and after the system resumes, the program reports a crash and stops generating new images.
This happens every time and stops me using it completely. I'm lucky if I can generate 1 512x512 image on a 3080 Ti.
I touched on this in #6790, but it has since drowned in the sea of issues. In the meantime I learned how to enable better logging for this, part of it was literally handed to me as the last line.
Steps to reproduce the problem
- Start SDUI. Same issue happens when using
--xformersas not using it. Nothing helps. - Enter anything and click generate. SDUI breaks and does not even try generating further images.
What should have happened?
Images are generated
Commit where the problem happens
ff6a5bcec1ce25aa8f08b157ea957d764be23d8d
What platforms do you use to access UI ?
Windows
What browsers do you use to access the UI ?
Mozilla Firefox, Google Chrome, Brave, Microsoft Edge
Command Line Arguments
No response
Additional information, context and logs
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\txt2img.py", line 52, in txt2img
processed = process_images(p)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 479, in process_images
res = process_images_inner(p)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 610, in process_images_inner
x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 610, in <listcomp>
x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\processing.py", line 408, in decode_first_stage
x = model.decode_first_stage(x)
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
return self.first_stage_model.decode(z)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
dec = self.decoder(z)
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
h = self.up[i_level].block[i_block](h, temb)
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\Desktop\AI\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 141, in forward
h = self.conv2(h)
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 512, 192, 192], dtype=torch.half, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
ConvolutionParams
memory_format = Contiguous
data_type = CUDNN_DATA_HALF
padding = [1, 1, 0]
stride = [1, 1, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0000029A821CD810
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 1, 512, 192, 192,
strideA = 18874368, 36864, 192, 1,
output: TensorDescriptor 0000029A821CDE30
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 1, 512, 192, 192,
strideA = 18874368, 36864, 192, 1,
weight: FilterDescriptor 0000029A637F5930
type = CUDNN_DATA_HALF
tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4
dimA = 512, 512, 3, 3,
Pointer addresses:
input: 0000001ED8800000
output: 0000001EDD000000
weight: 0000001E7EE00000
Forward algorithm: 1
Any further generation attempts result in the following until restart:
Traceback (most recent call last):
File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\call_queue.py", line 33, in f
shared.state.begin()
File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\shared.py", line 219, in begin
devices.torch_gc()
File "c:\Users\TCNO\Desktop\AI\stable-diffusion-webui\modules\devices.py", line 59, in torch_gc
torch.cuda.empty_cache()
File "C:\Users\TCNO\anaconda3\envs\SD\lib\site-packages\torch\cuda\memory.py", line 125, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: the launch timed out and was terminated
Heck, I have even tried:
- Uninstalling GPU drivers and installing Studio drivers. This helped for a day and it's back to this.
- Uninstalling ALL Python and anything NVIDIA/CUDA. Complete DDU in safe mode. Reinstalling latest GPU drivers, Cuda 11.7 and reinstalling A1's SDUI. A clean install of literally everything helped nothing.
- Completely nuke all python and use Conda instead. Conda install guide was followed. Same issue occurs.
- Editing the launch.py file to install torch compatible with CUDA 11.7. Same issue on stock CUDA 11.3
I am able to generate images, albeit twice as slow using --no-half --precision full.
Using --xformers does help.
Currently messing around happily in SDUI using the option set: --opt-split-attention --no-half-vae --medvram --always-batch-cond-uncond, and it seems to work just fine. The medvram option seems to help a lot with this issue. Even though I don't run out of VRAM (why would that happen on 1 512x512 image anyways)
So. Did you try suggested code snippet? Did it cause the error? If it causes the same error it means that something is wrong with your pytorch installation/cuda installation and there's nothing we can do except installing it properly (but how?)
to run it, create snippet.txt file in your sdwebui folder, paste code in there, change filename extension to .py
Then open cmd, navigate to your sdwebui using cd command.
run venv\Scripts\activate.bat
run python snippet.py
import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 512, 192, 192], dtype=torch.half, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(512, 512, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()
No issue running the script. Thought nothing happened, so I added a little print at the beginning and end, yet everything worked fine with the snippet.

Welp, now i have no idea what to do. Looks like some internal CUDA error with half precision operations.
Sadly, same boat... I can thankfully generate images with the arguments to 'nerf' it, and they look just as good to me... Just sad I'm giving up performance for no reason, or have to face endless crashes for no reason.
Do you have access to a different GPU? Like a friend's or smth? It may be caused by faulty one or overclocking.
Unfortunately not, I would assume it works fine... But as in #6790 - I am far from alone. I was running it undervolted with stock BIOS and everything, turned it off and had NO modifications - not even a higher voltage limit, same issue. The issues above were after a clean reboot following a complete CUDA and Nvidia driver uninstallation and DDU session, right after installing the Studio drivers and 11.7 CUDA. Same exact issue.
I will clean my PC today and hopefully that will change something, but I highly doubt it. Really doubt it has to do with heat
What i might suggest if you can't swap GPU - try another repo. InvokeAI, for example. If it will break in the same way, there's nothing we can do here.
I do this: torch.backends.cudnn.enabled = False and it stops showing the error, I assume I'm doing it at the cost of quality of computation but can someone clarify why this works and how bad is it to use this?