[Bug]: NaNs was produced in Unet and CUDA out of memory
Checklist
- [ ] The issue exists after disabling all extensions
- [ ] The issue exists on a clean installation of webui
- [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [X] The issue exists in the current version of the webui
- [ ] The issue has not been reported before recently
- [ ] The issue has been reported before but has not been fixed yet
What happened?
I haven't opened this app in the last 10 days. I understand that it has been updated. So, I decided to generate a 768×1280 photo as I did 10 days ago. I only had "set COMMANDLINE_ARGS=--medvram" in that folder because I have an Nvidia GeForce GTX 1650 with 4GB of VRAM. Everything worked fine, no matter what it was generating for a long time. And now, when generating, it first said that "A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card doesn't support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use the - -disable-nan-check commandline argument to disable this check." I tried first "Upcast cross attention layer to float32" and then added this "--no-half --disable-nan-check" and in both cases it started to say something like: "CUDA out of memory. Tried to allocate 960.00 MiB (GPU 0; 4.00 GiB total capacity; 1.50 GiB already allocated; 630.64 MiB free; 1.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF".
Steps to reproduce the problem
launch webui.bat, txt2img, wrote "girl" in positive prompts, A tensor with all NaNs was produced in Unet, close, edit webui.bat to --lowvram --no-half --disable-nan-check, launch, txt2img, wrote "girl" in positive prompts, CUDA out of memory.
What should have happened?
For example, generate a photo like this without any problems
What browsers do you use to access the UI ?
Google Chrome
Sysinfo
Console logs
From https://github.com/AUTOMATIC1111/stable-diffusion-webui
* branch master -> FETCH_HEAD
Already up to date.
venv "venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Version: 1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --lowvram --no-half --disable-nan-check
No module 'xformers'. Proceeding without it.
Style database not found: E:\StableDiffusion\styles.csv
Loading weights [9be2111a39] from E:\StableDiffusion\models\Stable-diffusion\futanariFactor_alphaV10.safetensors
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 83.2s (initial startup: 0.2s, prepare environment: 22.7s, import torch: 21.1s, import gradio: 8.6s, setup paths: 8.7s, initialize shared: 1.2s, other imports: 7.9s, setup codeformer: 1.1s, setup gfpgan: 0.3s, list SD models: 0.8s, load scripts: 3.6s, load upscalers: 0.2s, initialize extra networks: 0.5s, scripts before_ui_callback: 0.7s, create ui: 3.1s, gradio launch: 3.0s).
Creating model from config: E:\StableDiffusion\configs\v1-inference.yaml
Loading VAE weights specified in settings: E:\StableDiffusion\models\VAE\color101VAE_v1.pt
Applying attention optimization: Doggettx... done.
Model loaded in 156.3s (load weights from disk: 7.5s, create model: 0.8s, apply weights to model: 117.2s, apply float(): 0.3s, load VAE: 4.6s, hijack: 0.2s, load textual inversion embeddings: 0.8s, calculate empty prompt: 24.7s).
10%|████████▎ | 2/20 [01:00<09:05, 30.32s/it]
*** Error completing request | 2/20 [00:12<01:48, 6.04s/it]
*** Arguments: ('task(naisbmk5tyw9wby)', 'girl ', '(deformed, distorted, disfigured:1.3), poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, (mutated hands and fingers:1.4), disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation', [], 20, 'Euler a', 1, 1, 8, 1280, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x0000023D96DD1F30>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False) {}
Traceback (most recent call last):
File "E:\StableDiffusion\modules\call_queue.py", line 57, in f
res = list(func(*args, **kwargs))
File "E:\StableDiffusion\modules\call_queue.py", line 36, in f
res = func(*args, **kwargs)
File "E:\StableDiffusion\modules\txt2img.py", line 55, in txt2img
processed = processing.process_images(p)
File "E:\StableDiffusion\modules\processing.py", line 734, in process_images
res = process_images_inner(p)
File "E:\StableDiffusion\modules\processing.py", line 875, in process_images_inner
x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
File "E:\StableDiffusion\modules\processing.py", line 596, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "E:\StableDiffusion\modules\sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "E:\StableDiffusion\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "E:\StableDiffusion\modules\sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "E:\StableDiffusion\modules\sd_hijack_utils.py", line 28, in __call__
return self.__orig_func(*args, **kwargs)
File "E:\StableDiffusion\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
return self.first_stage_model.decode(z)
File "E:\StableDiffusion\modules\lowvram.py", line 71, in first_stage_model_decode_wrap
return first_stage_model_decode(z)
File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
dec = self.decoder(z)
File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 641, in forward
h = self.up[i_level].upsample(h)
File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\StableDiffusion\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 64, in forward
x = self.conv(x)
File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\StableDiffusion\extensions\a1111-sd-webui-lycoris\lycoris.py", line 753, in lyco_Conv2d_forward
return torch.nn.Conv2d_forward_before_lyco(self, input)
File "E:\StableDiffusion\extensions-builtin\Lora\networks.py", line 501, in network_Conv2d_forward
return originals.Conv2d_forward(self, input)
File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "E:\StableDiffusion\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 960.00 MiB (GPU 0; 4.00 GiB total capacity; 1.50 GiB already allocated; 630.64 MiB free; 1.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
---
Additional information
The last thing I can remember is that I used the "StableDiffusion InvokeAI Base Cloud version" in GoogleCollab
@Yevrey921 did you ever fix this problem? does gtx1650 work without --no-half and how many it/s do you get?