stable-diffusion-webui [Bug]: NansException: A tensor with all NaNs was produced in VAE.

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

An error "NansException: A tensor with all NaNs was produced in VAE." appears each and every time I use AUTOMATIC1111. Issue is consistent (rebooting, reinstalling don't change anything).

Steps to reproduce the problem

Steps to reproduce:

Download the version from the git with git clone (commit is "f6898c9")
Copy-paste two models in a "Stable-diffusion" folder in the models (realisticVisionV20_v20.safetytensors and v1-5-pruned-emaonly.ckpt)
Run webui-user.bat
Write "crowd of people" in the prompt
Click "Generate"
Wait until generation is complete

The output is always the following:

No image is generated
Same behavior with both models

Console log appearing in the browser:

NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.
Time taken: 58.44sTorch active/reserved: 3169/4214 MiB, Sys VRAM: 6128/6144 MiB (99.74%)

What should have happened?

An image should have been generated

Commit where the problem happens

5ab7f21

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Brave

Command Line Arguments

No

List of extensions

No

Console logs

venv "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89
Installing requirements
Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [c0d1994c73] from E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\realisticVisionV20_v20.safetensors
Creating model from config: E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 2.8s (load weights from disk: 0.1s, create model: 0.3s, apply weights to model: 0.4s, apply half(): 0.6s, move model to device: 0.6s, load textual inversion embeddings: 0.8s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 7.7s (import torch: 1.4s, import gradio: 0.8s, import ldm: 0.5s, other imports: 0.7s, load scripts: 0.8s, load SD checkpoint: 2.9s, create ui: 0.5s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:14<00:00,  3.72s/it]
Error completing request███████████████████████████████████████████████████████████████| 20/20 [00:56<00:00,  2.74s/it]
Arguments: ('task(mb9yha2wmki4j3h)', 'crowd of people', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 673, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\devices.py", line 156, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Calculating sha256 for E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.ckpt: cc6cb27103417325ff94f52b7a5d2dde45a7515b25c255d8e396c90014281516
Loading weights [cc6cb27103] from E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.ckpt
Applying cross attention optimization (Doggettx).
Weights loaded in 5.2s (calculate hash: 3.1s, load weights from disk: 1.3s, apply weights to model: 0.4s, move model to device: 0.5s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:55<00:00,  2.77s/it]
Error completing request:41,  2.71s/it]
Arguments: ('task(j2zr1ab3yif42ie)', 'crowd of people', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 673, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\devices.py", line 156, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Additional information

No response

May 05 '23 23:05 arthemis235

Adding --no-half-vae to the arguments should fix that

May 05 '23 23:05 CooperElektrik

Observed the same problem on 5ab7f21, adding –no-half-vae does not resolve it (it does change the message to “modules.devices.NansException: A tensor with all NaNs was produced in VAE. Use –disable-nan-check commandline argument to disable this check.” – apparently the error message is smart enough not to tell you to try –no-half-vae if you are already using it.

Taking the next step and adding --disable-nan-check along with --no-half-vae to the command-line arguments avoids the error, but results in a black image.

This definitely seems like a new issue introduced by a recent commit (I was back several commits before pulling 5ab7f21 so I can’t say for sure where it started.)

May 07 '23 03:05 cmdicely

Rolling back to torch 1.13.1, torchvision 0.14.1, xformers=0.0.16 seems to resolve the problem. [EDIT: An earlier version of this comment noted a different problem occurred after this fix, but I have determined that was unrelated]

So, it seems to be a torch 2.0.0 issue.

May 07 '23 07:05 cmdicely

same problem since version 1.1.0 of stable diffusion web-ui, I really tried everything, I replaced the cuda version to 11.8, torch to an earlier version, python to recent versions like 3.10.10 or 3.10.9, I reinstalled stable diffusion web-ui quite a few times with its different version, I tried the --xformers --lowvram command, even with what the error message that told me to put " --no-half --no-half-vae --disable-nan-check" additionally, and it didn't work, before only with "--xformers --lowvram" command arguments I could do really very large images or use up to 4 or 5 controlnet 1.1 tabs at the same time with "canny , depth..." and now I have trouble making them with only 1 before who sends me this message or it tells me out of memory. please need help

May 07 '23 18:05 Pluventi

So, going back to the default torch-2.0.0 setup with 5ab7f21 and trying the above suggestion of just doing --disable-nan-check, I still get a black image. It does resolve a different issue that sometimes occurs earlier in generation with an all-NaN’s tensor in the Unet, though, and if I get a preview image. But I get an all-black final image (apparently, because of whatever produces the all-NaNs tensor in the VAE..

May 07 '23 21:05 cmdicely

Same problem

May 09 '23 14:05 alexbespik

Run the app use webui.bat --disable-nan-check in cmd. It works for me.

May 11 '23 13:05 RhymeXY

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Bug]: NansException: A tensor with all NaNs was produced in VAE.

Is there an existing issue for this?

What happened?

Steps to reproduce the problem

What should have happened?

Commit where the problem happens

What platforms do you use to access the UI ?

What browsers do you use to access the UI ?

Command Line Arguments

List of extensions

Console logs

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard