stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: NansException: A tensor with all NaNs was produced in VAE.

Open arthemis235 opened this issue 1 year ago • 6 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

An error "NansException: A tensor with all NaNs was produced in VAE." appears each and every time I use AUTOMATIC1111. Issue is consistent (rebooting, reinstalling don't change anything).

Steps to reproduce the problem

Steps to reproduce:

  1. Download the version from the git with git clone (commit is "f6898c9")
  2. Copy-paste two models in a "Stable-diffusion" folder in the models (realisticVisionV20_v20.safetytensors and v1-5-pruned-emaonly.ckpt)
  3. Run webui-user.bat
  4. Write "crowd of people" in the prompt
  5. Click "Generate"
  6. Wait until generation is complete

The output is always the following:

  1. No image is generated
  2. Same behavior with both models

Console log appearing in the browser:

NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.
Time taken: 58.44sTorch active/reserved: 3169/4214 MiB, Sys VRAM: 6128/6144 MiB (99.74%)

What should have happened?

An image should have been generated

Commit where the problem happens

5ab7f21

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Brave

Command Line Arguments

No

List of extensions

No

Console logs

venv "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89
Installing requirements
Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [c0d1994c73] from E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\realisticVisionV20_v20.safetensors
Creating model from config: E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 2.8s (load weights from disk: 0.1s, create model: 0.3s, apply weights to model: 0.4s, apply half(): 0.6s, move model to device: 0.6s, load textual inversion embeddings: 0.8s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 7.7s (import torch: 1.4s, import gradio: 0.8s, import ldm: 0.5s, other imports: 0.7s, load scripts: 0.8s, load SD checkpoint: 2.9s, create ui: 0.5s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:14<00:00,  3.72s/it]
Error completing request███████████████████████████████████████████████████████████████| 20/20 [00:56<00:00,  2.74s/it]
Arguments: ('task(mb9yha2wmki4j3h)', 'crowd of people', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 673, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\devices.py", line 156, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Calculating sha256 for E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.ckpt: cc6cb27103417325ff94f52b7a5d2dde45a7515b25c255d8e396c90014281516
Loading weights [cc6cb27103] from E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.ckpt
Applying cross attention optimization (Doggettx).
Weights loaded in 5.2s (calculate hash: 3.1s, load weights from disk: 1.3s, apply weights to model: 0.4s, move model to device: 0.5s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:55<00:00,  2.77s/it]
Error completing request:41,  2.71s/it]
Arguments: ('task(j2zr1ab3yif42ie)', 'crowd of people', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\processing.py", line 673, in process_images_inner
    devices.test_for_nans(x, "vae")
  File "E:\Utilitaires\Stable Diffusion\stable-diffusion-webui\modules\devices.py", line 156, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in VAE. This could be because there's not enough precision to represent the picture. Try adding --no-half-vae commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.

Additional information

No response

arthemis235 avatar May 05 '23 23:05 arthemis235

Adding --no-half-vae to the arguments should fix that

CooperElektrik avatar May 05 '23 23:05 CooperElektrik

Observed the same problem on 5ab7f21, adding –no-half-vae does not resolve it (it does change the message to “modules.devices.NansException: A tensor with all NaNs was produced in VAE. Use –disable-nan-check commandline argument to disable this check.” – apparently the error message is smart enough not to tell you to try –no-half-vae if you are already using it.

Taking the next step and adding --disable-nan-check along with --no-half-vae to the command-line arguments avoids the error, but results in a black image.

This definitely seems like a new issue introduced by a recent commit (I was back several commits before pulling 5ab7f21 so I can’t say for sure where it started.)

cmdicely avatar May 07 '23 03:05 cmdicely

Rolling back to torch 1.13.1, torchvision 0.14.1, xformers=0.0.16 seems to resolve the problem. [EDIT: An earlier version of this comment noted a different problem occurred after this fix, but I have determined that was unrelated]

So, it seems to be a torch 2.0.0 issue.

cmdicely avatar May 07 '23 07:05 cmdicely

same problem since version 1.1.0 of stable diffusion web-ui, I really tried everything, I replaced the cuda version to 11.8, torch to an earlier version, python to recent versions like 3.10.10 or 3.10.9, I reinstalled stable diffusion web-ui quite a few times with its different version, I tried the --xformers --lowvram command, even with what the error message that told me to put " --no-half --no-half-vae --disable-nan-check" additionally, and it didn't work, before only with "--xformers --lowvram" command arguments I could do really very large images or use up to 4 or 5 controlnet 1.1 tabs at the same time with "canny , depth..." and now I have trouble making them with only 1 before who sends me this message or it tells me out of memory. please need help

Pluventi avatar May 07 '23 18:05 Pluventi

So, going back to the default torch-2.0.0 setup with 5ab7f21 and trying the above suggestion of just doing --disable-nan-check, I still get a black image. It does resolve a different issue that sometimes occurs earlier in generation with an all-NaN’s tensor in the Unet, though, and if I get a preview image. But I get an all-black final image (apparently, because of whatever produces the all-NaNs tensor in the VAE..

cmdicely avatar May 07 '23 21:05 cmdicely

Same problem

alexbespik avatar May 09 '23 14:05 alexbespik

Run the app use webui.bat --disable-nan-check in cmd. It works for me.

RhymeXY avatar May 11 '23 13:05 RhymeXY