automatic [Issue]: UniPC generates blank/black images, error message

Issue Description

Preview shows an image forming, but after finishing generation it turns black, and an error message is posted in console:

D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:281: RuntimeWarning: invalid value encountered in cast
  npimg = (npimg * 255).astype(np.uint8)

Version Platform Description

07:38:35-287045 INFO     Starting SD.Next
07:38:35-290569 INFO     Logger: file="D:\automatic\sdnext.log" level=DEBUG size=65 mode=create
07:38:35-292069 INFO     Python version=3.11.8 platform=Windows bin="D:\automatic\venv\Scripts\Python.exe"
                         venv="D:\automatic\venv"
07:38:37-894821 INFO     Version: app=sd.next updated=2024-06-08 hash=0fa68f5a branch=dev
                         url=https://github.com/vladmandic/automatic/tree/dev ui=dev
07:38:39-772337 INFO     Updating main repository
07:38:48-739528 INFO     Upgraded to version: 8a541d21 Sun Jun 9 09:54:16 2024 -0400
07:38:48-809794 INFO     Platform: arch=AMD64 cpu=AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD system=Windows
                         release=Windows-10-10.0.22631-SP0 python=3.11.8
07:38:48-817294 DEBUG    Torch allocator:
                         "garbage_collection_threshold:0.20,max_split_size_mb:512,backend:cudaMallocAsync"
07:38:48-818299 DEBUG    Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False
07:38:48-819322 DEBUG    Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True
07:39:14-275876 DEBUG    Extensions all: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
                         'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-images-browser',
                         'stable-diffusion-webui-rembg']
07:39:47-697870 INFO     Extensions enabled: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
                         'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-images-browser',
                         'stable-diffusion-webui-rembg', 'adetailer', 'canvas-zoom', 'sd-webui-infinite-image-browsing',
                         'sd-webui-temporal', 'ultimate-upscale-for-automatic1111']
07:39:49-381726 INFO     Command line args: ['--debug', '--upgrade', '--share'] share=True upgrade=True debug=True
07:40:04-921379 INFO     Load packages: {'torch': '2.2.1+cu121', 'diffusers': '0.28.1', 'gradio': '3.43.2'}
07:40:06-856727 INFO     VRAM: Detected=8.0 GB Optimization=medvram
07:40:06-863254 INFO     Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Dynamic Attention SDP"
                         mode=no_grad
07:40:07-096159 INFO     Device: device=NVIDIA GeForce RTX 3070 n=1 arch=sm_90 cap=(8, 6) cuda=12.1 cudnn=8801
                         driver=552.22
07:40:08-768076 DEBUG    ONNX: version=1.17.1 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider',
                         'CUDAExecutionProvider', 'CPUExecutionProvider']

Relevant log output

07:40:29-775080 DEBUG    Extension list: processed=353 installed=13 enabled=11 disabled=2 visible=353 hidden=0
07:40:29-987095 DEBUG    Root paths: ['D:\\automatic']
07:40:31-660079 INFO     Local URL: http://127.0.0.1:7860/
07:40:31-661079 INFO     Share URL: https://4f63c8c169851d651f.gradio.live
07:40:31-662079 DEBUG    Gradio functions: registered=3890
07:40:31-665082 DEBUG    FastAPI middleware: ['Middleware', 'Middleware']
07:40:31-668083 DEBUG    Creating API
07:40:31-906580 INFO     [AgentScheduler] Task queue is empty
07:40:31-907081 INFO     [AgentScheduler] Registering APIs
IIB Database file has been successfully backed up to the backup folder.
07:40:32-253020 DEBUG    Scripts setup: ['IP Adapters:0.027', 'AnimateDiff:0.011', 'ADetailer:0.308', 'X/Y/Z
                         Grid:0.016', 'Face:0.02', 'Image-to-Video:0.005', 'Stable Video Diffusion:0.008',
                         'Temporal:0.147', 'Ultimate SD upscale:0.01']
07:40:32-286254 DEBUG    Save: file="metadata.json" json=659 bytes=1408126 time=0.030
07:40:32-287252 INFO     Model metadata saved: file="metadata.json" items=1 time=0.00
07:40:32-289800 DEBUG    Torch mode: deterministic=True
07:40:32-488015 DEBUG    Desired Torch parameters: dtype=FP16 no-half=False no-half-vae=False upscast=False
07:40:32-490050 INFO     Setting Torch parameters: device=cuda dtype=torch.float16 vae=torch.float16 unet=torch.float16
                         context=inference_mode fp16=True bf16=None optimization=Dynamic Attention SDP
07:40:32-494049 DEBUG    Model requested: fn=<lambda>
07:40:32-495549 INFO     Select: model="tzigorealmixxl_v06CVAE [5fb008fc97]"
07:40:32-498056 DEBUG    Load model: existing=False target=D:\Stable Diffusion
                         Files\Models\Checkpoints\tzigorealmixxl_v06CVAE.safetensors info=None
07:40:32-500101 DEBUG    Diffusers loading: path="D:\Stable Diffusion
                         Files\Models\Checkpoints\tzigorealmixxl_v06CVAE.safetensors"
07:40:32-502101 INFO     Autodetect: model="Stable Diffusion XL" class=StableDiffusionXLPipeline file="D:\Stable
                         Diffusion Files\Models\Checkpoints\tzigorealmixxl_v06CVAE.safetensors" size=6617MB
Loading pipeline components... 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7/7  [ 0:00:02 < 0:00:00 , 2 C/s ]

07:40:36-652371 DEBUG    Setting model: pipeline=StableDiffusionXLPipeline config={'low_cpu_mem_usage': True,
                         'torch_dtype': torch.float16, 'load_connected_pipeline': True, 'variant': 'fp16',
                         'extract_ema': False, 'config': 'configs/sdxl', 'use_safetensors': True, 'cache_dir':
                         'C:\\Users\\Joshua\\.cache\\huggingface\\hub'}
07:40:40-851631 INFO     Load embeddings: loaded=14 skipped=4 time=4.19
07:40:40-853132 DEBUG    Setting model: enable VAE slicing
07:40:40-854132 DEBUG    Setting model: enable VAE tiling
07:40:40-888284 DEBUG    Setting model: enable model CPU offload
07:40:41-273368 DEBUG    GC: collected=230 device=cuda {'ram': {'used': 2.14, 'total': 31.9}, 'gpu': {'used': 1.1,
                         'total': 8.0}, 'retries': 0, 'oom': 0} time=0.33
07:40:41-284386 INFO     Load model: time=8.44 load=4.16 embeddings=4.19 move=0.09 native=1024 {'ram': {'used': 2.14,
                         'total': 31.9}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:40:41-288913 DEBUG    Script callback init time: image_browser.py:ui_tabs=5.11 system-info.py:app_started=0.10
                         task_scheduler.py:app_started=0.36
07:40:41-290410 DEBUG    Save: file="config.json" json=72 bytes=4426 time=0.002
07:40:41-291529 INFO     Startup time: 51.90 torch=12.42 gradio=1.86 diffusers=1.26 libraries=4.09 samplers=0.07
                         extensions=2.23 models=0.05 face-restore=0.64 upscalers=0.07 networks=0.08 ui-en=0.36
                         ui-txt2img=0.37 ui-img2img=0.24 ui-control=0.35 ui-extras=0.06 ui-models=0.05 ui-settings=0.34
                         ui-extensions=15.77 ui-defaults=0.12 launch=1.75 api=0.09 app-started=0.50 checkpoint=9.03
07:40:41-294410 DEBUG    Unused settings: ['cross_attention_options', 'civitai_link_key', 'multiple_tqdm',
                         'mudd_states', 'civitai_folder_lyco', 'diffusers_aesthetics_score']
07:42:00-323828 DEBUG    Server: alive=True jobs=1 requests=19 uptime=115 memory=2.14/31.9 backend=Backend.DIFFUSERS
                         state=idle
07:44:00-364321 DEBUG    Server: alive=True jobs=1 requests=42 uptime=235 memory=2.14/31.9 backend=Backend.DIFFUSERS
                         state=idle
07:49:46-710532 INFO     Applying hypertile: unet=384
07:49:46-736108 INFO     High memory utilization: GPU=20% RAM=24% {'ram': {'used': 7.54, 'total': 31.9}, 'gpu': {'used':
                         1.58, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:49:47-099997 DEBUG    GC: collected=128 device=cuda {'ram': {'used': 7.54, 'total': 31.9}, 'gpu': {'used': 1.58,
                         'total': 8.0}, 'retries': 0, 'oom': 0} time=0.36
07:49:47-104998 INFO     Base: class=StableDiffusionXLPipeline
07:49:47-108000 DEBUG    Sampler: sampler="UniPC" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'solver_order': 2, 'thresholding': False, 'sample_max_value': 1.0, 'predict_x0': 'bh1',
                         'lower_order_final': False, 'timestep_spacing': 'linspace'}
07:49:48-335309 DEBUG    Torch generator: device=cuda seeds=[1836090999]
07:49:48-340338 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
                         set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]),
                         'negative_prompt_embeds': torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds':
                         torch.Size([1, 1280]), 'guidance_scale': 7, 'num_inference_steps': 20, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 768,
                         'height': 1152, 'parser': 'Full parser'}
Progress  1.79it/s ███████████████████▊               60% 12/20 00:11 00:04 Base07:49:59-740732 DEBUG    Server: alive=True jobs=1 requests=171 uptime=594 memory=9.1/31.9 backend=Backend.DIFFUSERS
                         state=idle
Progress  1.31it/s █████████████████████████████████ 100% 20/20 00:15 00:00 Base
D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:281: RuntimeWarning: invalid value encountered in cast
  npimg = (npimg * 255).astype(np.uint8)
07:50:07-695665 INFO     High memory utilization: GPU=22% RAM=28% {'ram': {'used': 9.05, 'total': 31.9}, 'gpu': {'used':
                         1.72, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:08-120473 DEBUG    GC: collected=348 device=cuda {'ram': {'used': 9.05, 'total': 31.9}, 'gpu': {'used': 1.72,
                         'total': 8.0}, 'retries': 0, 'oom': 0} time=0.42
07:50:21-920823 ERROR    Failed to validate samples: sample=(1152, 768, 3) invalid=2654208
07:50:21-937341 WARNING  Attempted to correct samples: min=0.00 max=0.00 mean=0.00
07:50:21-979481 INFO     Save: image="D:\Stable Diffusion Files\Outputs\text\09684-tzigorealmixxl_v06CVAE-young woman
                         standing in street.png" type=PNG resolution=768x1152 size=3063
07:50:21-982987 INFO     High memory utilization: GPU=24% RAM=29% {'ram': {'used': 9.13, 'total': 31.9}, 'gpu': {'used':
                         1.9, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:22-382367 DEBUG    GC: collected=0 device=cuda {'ram': {'used': 9.13, 'total': 31.9}, 'gpu': {'used': 1.72,
                         'total': 8.0}, 'retries': 0, 'oom': 0} time=0.4
07:50:22-385366 INFO     Processed: images=1 time=35.67 its=0.56 memory={'ram': {'used': 9.13, 'total': 31.9}, 'gpu':
                         {'used': 1.72, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:22-477388 INFO     High memory utilization: GPU=22% RAM=27% {'ram': {'used': 8.73, 'total': 31.9}, 'gpu': {'used':
                         1.72, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:22-887920 DEBUG    GC: collected=128 device=cuda {'ram': {'used': 8.73, 'total': 31.9}, 'gpu': {'used': 1.72,
                         'total': 8.0}, 'retries': 0, 'oom': 0} time=0.41

Backend

Diffusers

Branch

Dev

Model

SD-XL

Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension issue

Jun 09 '24 14:06 MysticDaedra

that's not a sampler thing, thats an overflow in vae - looks like vae baked in that model is not using fp16 fixed vae. set settings -> diffusers -> vae upcasting -> true or load fp15 vae explicitly.

Jun 09 '24 15:06 vladmandic

after long conversation in discord, i still cannot reproduce this. i do believe there is something strange there, but cannot do much without reproducing it locally first.

Jun 12 '24 12:06 vladmandic

@vladmandic same error with 'Euler a' sampler and it seems to happen randomly.

Open SD.Next, set VAE Model = None, Load a model "PonyDiffusionV6XL', configure generation settings ("Smiling girl" positive prompt, "Euler a" sampler, 20 steps, 5 cfg scale), run Generation - ok.

Load a testing model "Stable Diffusion". click Generate - ok Change steps from 20 to 40, click Generate - "functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)" ~4/40 steps progress

Change steps from 40 back to 20, click generate - same error at 4/20 steps progress

Load the pony model again, change steps to 75, generate - ok. Load test model, click generate - ok Check Full Quality, click generate - ok Check HiDiffusion, click generate - ok Add Adapter "Full Face", add a png as Input Image - ok Check Face Restore, click generate - "functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)" ~3/20 (why 20???) steps progress Uncheck "Face Restore", click generate - same error at 4/75 steps

Jun 18 '24 10:06 AznamirWoW

@AznamirWoW that's a different issue as those are warning that are coming from live preview which never works at the same precision due to performance impact. they can be ignored, but if you want to pursue it further, create new issue for that.

Jun 18 '24 11:06 vladmandic

@AznamirWoW that's a different issue as those are warning that are coming from live preview which never works at the same precision due to performance impact. they can be ignored, but if you want to pursue it further, create new issue for that.

Well, it is not a live preview issue. The result of the error is a blank image generated at the end, or if the 'face restore' fails, then a black square over the face.

Jun 18 '24 11:06 AznamirWoW

that is not direct result of the error above at all. if you have blank image at the end, fine, then leave it here. i'm saying that specific error you've quoted comes from live preview.

Jun 18 '24 12:06 vladmandic

Hello,I'm using with directml.I got a blank image.It's all white. After the progress,I saw something looks like error: E:\automatic\venv\lib\site-packages\torchvision\transforms\functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)

Using anything-v4.5 model,Euler a or DPM++ 2M. 00001-anything-v4 5-fp16-1girl sitting white stockings cute

Aug 16 '24 02:08 xhy2008

After enabling "skip generation if NaN found in lanterns",whatever model or sampler I use,the console outputs "a NaNs is detected at step 0". I wonder why it produces NaNs even I have used full precision.

Aug 16 '24 02:08 xhy2008

Today I used the original backend,with a slower load speed, it produced images normally.

Maybe it's a bug about diffusers.

Aug 17 '24 08:08 xhy2008

Just wanted to update that UniPC on my installation is still bugged, I am still getting the same error in console:

D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:282: RuntimeWarning: invalid value encountered in cast
  npimg = (npimg * 255).astype(np.uint8)

Autocast is enabled, set to FP16.

Aug 18 '24 19:08 MysticDaedra

Just wanted to update that UniPC on my installation is still bugged, I am still getting the same error in console:
D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:282: RuntimeWarning: invalid value encountered in cast
  npimg = (npimg * 255).astype(np.uint8)
Autocast is enabled, set to FP16.

my comment from earlier still stands - i cannot reproduce. which means i need exact steps to reproduce and as much details as possible. here i don't even know which model or platform or gpu we're talking about.

Aug 28 '24 12:08 vladmandic

Older now, bug/issue still exists (I've even seen other folks experiencing this issue on Discord).

I wish I knew what I could do to aid in cornering this bug. There are so many variables with all the different settings, I don't know how to be "precise" except to give you my config file or something. Doesn't matter what the prompt is, doesn't seem to matter what the model is (afaik, tested with SDXL, and SD3.5 Medium at least), with lora, without lora, with extensions, without extensions...

BTW, and I don't mean to be snarky or anything, but my hardware and model and all that... they are in the logs and whatnot above... But to make it simple, it's an RTX 3070 8gb, Windows 11 Professional, 32gb sysram, r7 5700X @ 4.8ghz, dev version 59cd08f5 now.

Here's a snippet of my latest log with the "relevant" bit. No noticeable errors that I can see. Full log also attached.

17:47:22-967784 INFO     Applying hypertile: unet=448
17:47:22-993790 INFO     XYZ grid start: images=135 grid=1 shape=27x5 cells=1 steps=1080
17:47:22-995792 DEBUG    XYZ grid process: x=1/27 y=1/5 z=1/1 total=0.01
17:47:22-998792 DEBUG    XYZ grid apply sampler: "UniPC"
17:47:22-999792 DEBUG    XYZ grid apply field: steps=4
17:47:23-000792 INFO     Applying hypertile: unet=448
Load network: D:\Stable Diffusion Files\Models\Loras\Microwaist_XL_v01.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/103.7 MB -:--:--
17:47:23-852588 DEBUG    LoRA name="Microwaist_XL_v01" type={'ModuleTypeLora'} keys=788
17:47:24-201199 DEBUG    GC: utilization={'gpu': 55, 'ram': 10, 'threshold': 25} gc={'collected': 27009, 'saved': 0.03} before={'gpu': 4.4, 'ram': 3.21} after={'gpu': 4.37, 'ram': 3.21, 'retries': 0, 'oom': 0}
                         device=cuda fn=activate:load_networks time=0.34
17:47:24-203201 INFO     Load network: type=LoRA apply=['Microwaist_XL_v01'] te=[1.5] unet=[[1.5, 1.5, 1.5]] dims=[None] load=1.17
17:47:24-209202 INFO     Base: class=StableDiffusionXLPipeline
17:47:24-210202 DEBUG    Sampler: sampler="UniPC" class="UniPCMultistepScheduler config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear',
                         'prediction_type': 'epsilon', 'predict_x0': True, 'sample_max_value': 1.0, 'solver_order': 2, 'solver_type': 'bh2', 'thresholding': False, 'use_beta_sigmas': False,
                         'use_exponential_sigmas': False, 'use_karras_sigmas': False, 'lower_order_final': False, 'timestep_spacing': 'leading', 'final_sigmas_type': 'zero', 'rescale_betas_zero_snr': True}
17:47:24-536289 DEBUG    GC: utilization={'gpu': 55, 'ram': 10, 'threshold': 25} gc={'collected': 127, 'saved': 0.0} before={'gpu': 4.37, 'ram': 3.21} after={'gpu': 4.37, 'ram': 3.21, 'retries': 0, 'oom': 0}
                         device=cuda fn=__init__:prepare_model time=0.32
17:47:25-684725 DEBUG    GC: utilization={'gpu': 63, 'ram': 11, 'threshold': 25} gc={'collected': 2737, 'saved': 0.03} before={'gpu': 5.06, 'ram': 3.35} after={'gpu': 5.03, 'ram': 3.35, 'retries': 0, 'oom': 0}
                         device=cuda fn=encode:prepare_model time=0.3
17:47:25-687726 DEBUG    Torch generator: device=cuda seeds=[1969483135]
17:47:25-688727 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1 set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1,
                         1280]), 'negative_prompt_embeds': torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 3, 'num_inference_steps': 4, 'eta': 1.0,
                         'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 896, 'height': 1024, 'parser': 'native'}
Progress  2.01s/it ███████████████████████████████████ 100% 4/4 00:08 00:00 Base
17:47:34-160425 DEBUG    GC: utilization={'gpu': 67, 'ram': 10, 'threshold': 25} gc={'collected': 248, 'saved': 0.91} before={'gpu': 5.34, 'ram': 3.24} after={'gpu': 4.43, 'ram': 3.24, 'retries': 0, 'oom': 0}
                         device=cuda fn=process_base:nextjob time=0.31
17:47:34-162425 DEBUG    Init hires: upscaler="ESRGAN 4x Ultrasharp" sampler="DPM++ 3M" resize=1523x1740 upscale=1523x1740
17:47:34-163425 INFO     Upscale: mode=1 upscaler="ESRGAN 4x Ultrasharp" context="Add with forward" resize=1523x1740 upscale=1523x1740
17:47:35-221920 DEBUG    VAE decode: vae name="default" dtype=torch.bfloat16 device=cuda:0 upcast=False slicing=True tiling=True latents shape=torch.Size([1, 4, 128, 112]) dtype=torch.bfloat16 device=cuda:0
                         time=1.057
17:47:35-593443 DEBUG    GC: utilization={'gpu': 52, 'ram': 19, 'threshold': 25} gc={'collected': 127, 'saved': 2.12} before={'gpu': 4.12, 'ram': 5.93} after={'gpu': 2.0, 'ram': 5.93, 'retries': 0, 'oom': 0}
                         device=cuda fn=resize_hires:vae_decode time=0.35
17:47:35-896024 DEBUG    GC: utilization={'gpu': 25, 'ram': 19, 'threshold': 25} gc={'collected': 127, 'saved': 0.0} before={'gpu': 2.0, 'ram': 5.93} after={'gpu': 2.0, 'ram': 5.93, 'retries': 0, 'oom': 0}
                         device=cuda fn=upscale:begin time=0.3
17:47:35-947098 INFO     Upscaler loaded: type=ESRGAN model=D:\Stable Diffusion Files\Models\ESRGAN\ESRGAN-UltraSharp-4x.pth
Upscaling ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:07
17:47:43-222081 DEBUG    Upscaler unloaded: type=ESRGAN model=D:\Stable Diffusion Files\Models\ESRGAN\ESRGAN-UltraSharp-4x.pth
17:47:43-566105 DEBUG    GC: utilization={'gpu': 30, 'ram': 19, 'threshold': 25} gc={'collected': 454, 'saved': 0.37} before={'gpu': 2.37, 'ram': 6.0} after={'gpu': 2.0, 'ram': 6.0, 'retries': 0, 'oom': 0}
                         device=cuda fn=upscale:do_upscale time=0.34
17:47:44-006723 DEBUG    GC: utilization={'gpu': 25, 'ram': 19, 'threshold': 25} gc={'collected': 162, 'saved': 0.0} before={'gpu': 2.0, 'ram': 5.96} after={'gpu': 2.0, 'ram': 5.96, 'retries': 0, 'oom': 0}
                         device=cuda fn=upscale:end time=0.32
17:47:44-008724 DEBUG    Image resize: input=<PIL.Image.Image image mode=RGB size=896x1024 at 0x22693193F50> width=1523 height=1740 mode="Fixed" upscaler="ESRGAN 4x Ultrasharp" context="Add with forward"
                         type=image result=<PIL.Image.Image image mode=RGB size=1523x1740 at 0x226CF436BD0> time=8.41 fn=process_hires:resize_hires
17:47:44-330322 DEBUG    GC: utilization={'gpu': 25, 'ram': 19, 'threshold': 25} gc={'collected': 127, 'saved': 0.0} before={'gpu': 2.0, 'ram': 5.96} after={'gpu': 2.0, 'ram': 5.96, 'retries': 0, 'oom': 0}
                         device=cuda fn=process_hires:resize_hires time=0.32
17:47:44-753000 DEBUG    GC: utilization={'gpu': 34, 'ram': 19, 'threshold': 25} gc={'collected': 162, 'saved': 0.75} before={'gpu': 2.75, 'ram': 5.96} after={'gpu': 2.0, 'ram': 5.96, 'retries': 0, 'oom': 0}
                         device=cuda fn=process_hires:nextjob time=0.33

0: 640x576 (no detections), 247.1ms
Speed: 2.0ms preprocess, 247.1ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 576)
[-] ADetailer: nothing detected on image 1 with 1st settings.

0: 640x576 (no detections), 186.1ms
Speed: 2.0ms preprocess, 186.1ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 576)
[-] ADetailer: nothing detected on image 1 with 2nd settings.
W0000 00:00:1731808065.771732   29784 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1731808065.776372   23816 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
[-] ADetailer: nothing detected on image 1 with 3rd settings.

0: 640x576 (no detections), 528.4ms
Speed: 3.0ms preprocess, 528.4ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 576)
[-] ADetailer: nothing detected on image 1 with 4th settings.
17:47:46-940206 INFO     Save: image="D:\Stable Diffusion Files\Outputs\text\12826-mklanRealistic_mklanRealxlV1HSD-a full body photorealistic photograph of a young.png" type=PNG width=1523 height=1740
                         size=10756
17:47:47-283735 DEBUG    GC: utilization={'gpu': 25, 'ram': 21, 'threshold': 25} gc={'collected': 8835, 'saved': 0.0} before={'gpu': 2.0, 'ram': 6.57} after={'gpu': 2.0, 'ram': 6.57, 'retries': 0, 'oom': 0}
                         device=cuda fn=process_images:process_images_inner time=0.34
17:47:47-307741 INFO     Processed: images=1 its=0.16 time=24.28 timers={'gc': 3.57, 'init': 1.2, 'encode': 1.47, 'args': 1.49, 'move': 0.02, 'pipeline': 8.03, 'hires': 11.0, 'post': 2.55} memory={'ram':
                         {'used': 6.57, 'total': 31.9}, 'gpu': {'used': 2.0, 'total': 8.0}, 'retries': 0, 'oom': 0}

sdnext.log

EDIT: FWIW, the black appears (via live preview) immediately after inference when VAE decoding starts. Something causing a problem with the decoding or something? VAE is set to Automatic, and I have both the "fixed" VAE and the distilled VAE in the VAE folder as options for it to pick. I might try using different VAEs in a future test, running a grid atm.

Nov 17 '24 01:11 MysticDaedra

On SDXL, tested base, fixed, and "low memory" VAE, black image still appears at beginning of VAE decode.

Nov 17 '24 02:11 MysticDaedra

i believe there is an issue, but its not a general one and its not something i can reproduce. and without reproduction, i cannot fix it.

so, start with absolute minimal reproduction.

does this happen with any sdxl model or specific ones only?
is upscaling or hires relevant to reproduction? if not, remove it as it only adds complexities without adding value to troubleshooting.
are you running using fp16 or bf16? did you try bf 16 (which is now default for all compatible gpus). etc...

Nov 17 '24 03:11 vladmandic

I'm not sure I know enough to even be able to do a bare minimum workflow, but here goes:

All SDXL models tested (at least a dozen different ones, some of which I no longer have) do not work with UniPC.
Black screen appears during initial generation, before anything else. Hit generate, inference, at beginning of VAE decode black image appears. Upscaling, hires fix, extensions, scripts: all irrelevant, bug occurs before those are reached in the workflow.
Not 100% sure which precision type is being used, I just looked in my settings and it is set to "auto". So presumably BF16? Preferred model variant and preferred VAE variant are both set to "default".
Quantization is currently enabled, but black image occurred without quantization as well. NNCF.
Hypertile enabled or disabled (UNET only, I never use hypertile vae)
using --medvram, haven't tried --lowvram. --medvram is required for me to run SDXL, OOM otherwise

I can't really think of what else I can do to "bare minimum" on my GPU, if I turn anything else off (or on?), I'll start having issues running SDXL at all on my GPU. Again, only 8gb VRAM. Many of the settings I'm using are explicitly because without them I will get OOM or massively degraded performance.

Nov 17 '24 09:11 MysticDaedra

Not 100% sure which precision type is being used, I just looked in my settings and it is set to "auto"

from your log:

2024-11-16 11:59:02,387 | sd | INFO | devices | Torch parameters: backend=cuda device=cuda config=Auto dtype=torch.bfloat16 vae=torch.bfloat16 unet=torch.bfloat16 context=no_grad nohalf=False nohalfvae=False upscast=False deterministic=True test-fp16=True test-bf16=True optimization="Scaled-Dot-Product"

Quantization is currently enabled, but black image occurred without quantization as well. NNCF.

i believe you, but please try to understand my point of view - i need a clean log. if i see hypertile or nncf or detailer or hires in the log and they have no relevance on the issue, it just makes any kind of analysis that much harder. so once again, please reproduce without anything that is not relevant - just to have as simple as possible log. if you need medvram, thats fine. i never said, disable everything - i said disable everything that is not relevant and/or needed.

Nov 17 '24 12:11 vladmandic

revisiting old issues - is this still happening?

Oct 17 '25 20:10 vladmandic

Sort of. Just did a test, please note that this is an older version of dev branch (due to ROCm for Windows, I'm trying to only update when I see an update for ROCm etc, trying to keep my install somewhat stable-ish), commit 5db54ffb.

Initially, the image during inference looks something like this:

After VAE decoding, it is a black image:

Here is the log:

sdnext.log

Parameters: 1024x1536 HiDiffusion (no difference with this disabled) Hypertile (no difference with this disabled) CFG 4.0, default sampler parameters No refine, no scripts.

I am now on an RX 9070 XT using ROCm for Windows, but the behavior I am experiencing with UniPC seems to be identical as when I was using an RTX 3070.

Oct 18 '25 05:10 MysticDaedra

ah, still happening and not any closer to figuring out why is unipc not bf16 safe in your case.

Oct 18 '25 14:10 vladmandic