[Issue]: UniPC generates blank/black images, error message
Issue Description
Preview shows an image forming, but after finishing generation it turns black, and an error message is posted in console:
D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:281: RuntimeWarning: invalid value encountered in cast
npimg = (npimg * 255).astype(np.uint8)
Version Platform Description
07:38:35-287045 INFO Starting SD.Next
07:38:35-290569 INFO Logger: file="D:\automatic\sdnext.log" level=DEBUG size=65 mode=create
07:38:35-292069 INFO Python version=3.11.8 platform=Windows bin="D:\automatic\venv\Scripts\Python.exe"
venv="D:\automatic\venv"
07:38:37-894821 INFO Version: app=sd.next updated=2024-06-08 hash=0fa68f5a branch=dev
url=https://github.com/vladmandic/automatic/tree/dev ui=dev
07:38:39-772337 INFO Updating main repository
07:38:48-739528 INFO Upgraded to version: 8a541d21 Sun Jun 9 09:54:16 2024 -0400
07:38:48-809794 INFO Platform: arch=AMD64 cpu=AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD system=Windows
release=Windows-10-10.0.22631-SP0 python=3.11.8
07:38:48-817294 DEBUG Torch allocator:
"garbage_collection_threshold:0.20,max_split_size_mb:512,backend:cudaMallocAsync"
07:38:48-818299 DEBUG Torch overrides: cuda=False rocm=False ipex=False diml=False openvino=False
07:38:48-819322 DEBUG Torch allowed: cuda=True rocm=True ipex=True diml=True openvino=True
07:39:14-275876 DEBUG Extensions all: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-images-browser',
'stable-diffusion-webui-rembg']
07:39:47-697870 INFO Extensions enabled: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info',
'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-images-browser',
'stable-diffusion-webui-rembg', 'adetailer', 'canvas-zoom', 'sd-webui-infinite-image-browsing',
'sd-webui-temporal', 'ultimate-upscale-for-automatic1111']
07:39:49-381726 INFO Command line args: ['--debug', '--upgrade', '--share'] share=True upgrade=True debug=True
07:40:04-921379 INFO Load packages: {'torch': '2.2.1+cu121', 'diffusers': '0.28.1', 'gradio': '3.43.2'}
07:40:06-856727 INFO VRAM: Detected=8.0 GB Optimization=medvram
07:40:06-863254 INFO Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Dynamic Attention SDP"
mode=no_grad
07:40:07-096159 INFO Device: device=NVIDIA GeForce RTX 3070 n=1 arch=sm_90 cap=(8, 6) cuda=12.1 cudnn=8801
driver=552.22
07:40:08-768076 DEBUG ONNX: version=1.17.1 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider',
'CUDAExecutionProvider', 'CPUExecutionProvider']
Relevant log output
07:40:29-775080 DEBUG Extension list: processed=353 installed=13 enabled=11 disabled=2 visible=353 hidden=0
07:40:29-987095 DEBUG Root paths: ['D:\\automatic']
07:40:31-660079 INFO Local URL: http://127.0.0.1:7860/
07:40:31-661079 INFO Share URL: https://4f63c8c169851d651f.gradio.live
07:40:31-662079 DEBUG Gradio functions: registered=3890
07:40:31-665082 DEBUG FastAPI middleware: ['Middleware', 'Middleware']
07:40:31-668083 DEBUG Creating API
07:40:31-906580 INFO [AgentScheduler] Task queue is empty
07:40:31-907081 INFO [AgentScheduler] Registering APIs
IIB Database file has been successfully backed up to the backup folder.
07:40:32-253020 DEBUG Scripts setup: ['IP Adapters:0.027', 'AnimateDiff:0.011', 'ADetailer:0.308', 'X/Y/Z
Grid:0.016', 'Face:0.02', 'Image-to-Video:0.005', 'Stable Video Diffusion:0.008',
'Temporal:0.147', 'Ultimate SD upscale:0.01']
07:40:32-286254 DEBUG Save: file="metadata.json" json=659 bytes=1408126 time=0.030
07:40:32-287252 INFO Model metadata saved: file="metadata.json" items=1 time=0.00
07:40:32-289800 DEBUG Torch mode: deterministic=True
07:40:32-488015 DEBUG Desired Torch parameters: dtype=FP16 no-half=False no-half-vae=False upscast=False
07:40:32-490050 INFO Setting Torch parameters: device=cuda dtype=torch.float16 vae=torch.float16 unet=torch.float16
context=inference_mode fp16=True bf16=None optimization=Dynamic Attention SDP
07:40:32-494049 DEBUG Model requested: fn=<lambda>
07:40:32-495549 INFO Select: model="tzigorealmixxl_v06CVAE [5fb008fc97]"
07:40:32-498056 DEBUG Load model: existing=False target=D:\Stable Diffusion
Files\Models\Checkpoints\tzigorealmixxl_v06CVAE.safetensors info=None
07:40:32-500101 DEBUG Diffusers loading: path="D:\Stable Diffusion
Files\Models\Checkpoints\tzigorealmixxl_v06CVAE.safetensors"
07:40:32-502101 INFO Autodetect: model="Stable Diffusion XL" class=StableDiffusionXLPipeline file="D:\Stable
Diffusion Files\Models\Checkpoints\tzigorealmixxl_v06CVAE.safetensors" size=6617MB
Loading pipeline components... 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7/7 [ 0:00:02 < 0:00:00 , 2 C/s ]
07:40:36-652371 DEBUG Setting model: pipeline=StableDiffusionXLPipeline config={'low_cpu_mem_usage': True,
'torch_dtype': torch.float16, 'load_connected_pipeline': True, 'variant': 'fp16',
'extract_ema': False, 'config': 'configs/sdxl', 'use_safetensors': True, 'cache_dir':
'C:\\Users\\Joshua\\.cache\\huggingface\\hub'}
07:40:40-851631 INFO Load embeddings: loaded=14 skipped=4 time=4.19
07:40:40-853132 DEBUG Setting model: enable VAE slicing
07:40:40-854132 DEBUG Setting model: enable VAE tiling
07:40:40-888284 DEBUG Setting model: enable model CPU offload
07:40:41-273368 DEBUG GC: collected=230 device=cuda {'ram': {'used': 2.14, 'total': 31.9}, 'gpu': {'used': 1.1,
'total': 8.0}, 'retries': 0, 'oom': 0} time=0.33
07:40:41-284386 INFO Load model: time=8.44 load=4.16 embeddings=4.19 move=0.09 native=1024 {'ram': {'used': 2.14,
'total': 31.9}, 'gpu': {'used': 1.1, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:40:41-288913 DEBUG Script callback init time: image_browser.py:ui_tabs=5.11 system-info.py:app_started=0.10
task_scheduler.py:app_started=0.36
07:40:41-290410 DEBUG Save: file="config.json" json=72 bytes=4426 time=0.002
07:40:41-291529 INFO Startup time: 51.90 torch=12.42 gradio=1.86 diffusers=1.26 libraries=4.09 samplers=0.07
extensions=2.23 models=0.05 face-restore=0.64 upscalers=0.07 networks=0.08 ui-en=0.36
ui-txt2img=0.37 ui-img2img=0.24 ui-control=0.35 ui-extras=0.06 ui-models=0.05 ui-settings=0.34
ui-extensions=15.77 ui-defaults=0.12 launch=1.75 api=0.09 app-started=0.50 checkpoint=9.03
07:40:41-294410 DEBUG Unused settings: ['cross_attention_options', 'civitai_link_key', 'multiple_tqdm',
'mudd_states', 'civitai_folder_lyco', 'diffusers_aesthetics_score']
07:42:00-323828 DEBUG Server: alive=True jobs=1 requests=19 uptime=115 memory=2.14/31.9 backend=Backend.DIFFUSERS
state=idle
07:44:00-364321 DEBUG Server: alive=True jobs=1 requests=42 uptime=235 memory=2.14/31.9 backend=Backend.DIFFUSERS
state=idle
07:49:46-710532 INFO Applying hypertile: unet=384
07:49:46-736108 INFO High memory utilization: GPU=20% RAM=24% {'ram': {'used': 7.54, 'total': 31.9}, 'gpu': {'used':
1.58, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:49:47-099997 DEBUG GC: collected=128 device=cuda {'ram': {'used': 7.54, 'total': 31.9}, 'gpu': {'used': 1.58,
'total': 8.0}, 'retries': 0, 'oom': 0} time=0.36
07:49:47-104998 INFO Base: class=StableDiffusionXLPipeline
07:49:47-108000 DEBUG Sampler: sampler="UniPC" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
'solver_order': 2, 'thresholding': False, 'sample_max_value': 1.0, 'predict_x0': 'bh1',
'lower_order_final': False, 'timestep_spacing': 'linspace'}
07:49:48-335309 DEBUG Torch generator: device=cuda seeds=[1836090999]
07:49:48-340338 DEBUG Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1
set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]),
'negative_prompt_embeds': torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds':
torch.Size([1, 1280]), 'guidance_scale': 7, 'num_inference_steps': 20, 'eta': 1.0,
'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 768,
'height': 1152, 'parser': 'Full parser'}
Progress 1.79it/s ███████████████████▊ 60% 12/20 00:11 00:04 Base07:49:59-740732 DEBUG Server: alive=True jobs=1 requests=171 uptime=594 memory=9.1/31.9 backend=Backend.DIFFUSERS
state=idle
Progress 1.31it/s █████████████████████████████████ 100% 20/20 00:15 00:00 Base
D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:281: RuntimeWarning: invalid value encountered in cast
npimg = (npimg * 255).astype(np.uint8)
07:50:07-695665 INFO High memory utilization: GPU=22% RAM=28% {'ram': {'used': 9.05, 'total': 31.9}, 'gpu': {'used':
1.72, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:08-120473 DEBUG GC: collected=348 device=cuda {'ram': {'used': 9.05, 'total': 31.9}, 'gpu': {'used': 1.72,
'total': 8.0}, 'retries': 0, 'oom': 0} time=0.42
07:50:21-920823 ERROR Failed to validate samples: sample=(1152, 768, 3) invalid=2654208
07:50:21-937341 WARNING Attempted to correct samples: min=0.00 max=0.00 mean=0.00
07:50:21-979481 INFO Save: image="D:\Stable Diffusion Files\Outputs\text\09684-tzigorealmixxl_v06CVAE-young woman
standing in street.png" type=PNG resolution=768x1152 size=3063
07:50:21-982987 INFO High memory utilization: GPU=24% RAM=29% {'ram': {'used': 9.13, 'total': 31.9}, 'gpu': {'used':
1.9, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:22-382367 DEBUG GC: collected=0 device=cuda {'ram': {'used': 9.13, 'total': 31.9}, 'gpu': {'used': 1.72,
'total': 8.0}, 'retries': 0, 'oom': 0} time=0.4
07:50:22-385366 INFO Processed: images=1 time=35.67 its=0.56 memory={'ram': {'used': 9.13, 'total': 31.9}, 'gpu':
{'used': 1.72, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:22-477388 INFO High memory utilization: GPU=22% RAM=27% {'ram': {'used': 8.73, 'total': 31.9}, 'gpu': {'used':
1.72, 'total': 8.0}, 'retries': 0, 'oom': 0}
07:50:22-887920 DEBUG GC: collected=128 device=cuda {'ram': {'used': 8.73, 'total': 31.9}, 'gpu': {'used': 1.72,
'total': 8.0}, 'retries': 0, 'oom': 0} time=0.41
Backend
Diffusers
Branch
Dev
Model
SD-XL
Acknowledgements
- [X] I have read the above and searched for existing issues
- [X] I confirm that this is classified correctly and its not an extension issue
that's not a sampler thing, thats an overflow in vae - looks like vae baked in that model is not using fp16 fixed vae. set settings -> diffusers -> vae upcasting -> true or load fp15 vae explicitly.
after long conversation in discord, i still cannot reproduce this. i do believe there is something strange there, but cannot do much without reproducing it locally first.
@vladmandic same error with 'Euler a' sampler and it seems to happen randomly.
Open SD.Next, set VAE Model = None, Load a model "PonyDiffusionV6XL', configure generation settings ("Smiling girl" positive prompt, "Euler a" sampler, 20 steps, 5 cfg scale), run Generation - ok.
Load a testing model "Stable Diffusion". click Generate - ok Change steps from 20 to 40, click Generate - "functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)" ~4/40 steps progress
Change steps from 40 back to 20, click generate - same error at 4/20 steps progress
Load the pony model again, change steps to 75, generate - ok. Load test model, click generate - ok Check Full Quality, click generate - ok Check HiDiffusion, click generate - ok Add Adapter "Full Face", add a png as Input Image - ok Check Face Restore, click generate - "functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)" ~3/20 (why 20???) steps progress Uncheck "Face Restore", click generate - same error at 4/75 steps
@AznamirWoW that's a different issue as those are warning that are coming from live preview which never works at the same precision due to performance impact. they can be ignored, but if you want to pursue it further, create new issue for that.
@AznamirWoW that's a different issue as those are warning that are coming from live preview which never works at the same precision due to performance impact. they can be ignored, but if you want to pursue it further, create new issue for that.
Well, it is not a live preview issue. The result of the error is a blank image generated at the end, or if the 'face restore' fails, then a black square over the face.
that is not direct result of the error above at all. if you have blank image at the end, fine, then leave it here. i'm saying that specific error you've quoted comes from live preview.
Hello,I'm using with directml.I got a blank image.It's all white. After the progress,I saw something looks like error: E:\automatic\venv\lib\site-packages\torchvision\transforms\functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)
Using anything-v4.5 model,Euler a or DPM++ 2M.
After enabling "skip generation if NaN found in lanterns",whatever model or sampler I use,the console outputs "a NaNs is detected at step 0". I wonder why it produces NaNs even I have used full precision.
Today I used the original backend,with a slower load speed, it produced images normally.
Maybe it's a bug about diffusers.
Just wanted to update that UniPC on my installation is still bugged, I am still getting the same error in console:
D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:282: RuntimeWarning: invalid value encountered in cast
npimg = (npimg * 255).astype(np.uint8)
Autocast is enabled, set to FP16.
Just wanted to update that UniPC on my installation is still bugged, I am still getting the same error in console:
D:\automatic\venv\Lib\site-packages\torchvision\transforms\functional.py:282: RuntimeWarning: invalid value encountered in cast npimg = (npimg * 255).astype(np.uint8)Autocast is enabled, set to FP16.
my comment from earlier still stands - i cannot reproduce. which means i need exact steps to reproduce and as much details as possible. here i don't even know which model or platform or gpu we're talking about.
Older now, bug/issue still exists (I've even seen other folks experiencing this issue on Discord).
I wish I knew what I could do to aid in cornering this bug. There are so many variables with all the different settings, I don't know how to be "precise" except to give you my config file or something. Doesn't matter what the prompt is, doesn't seem to matter what the model is (afaik, tested with SDXL, and SD3.5 Medium at least), with lora, without lora, with extensions, without extensions...
BTW, and I don't mean to be snarky or anything, but my hardware and model and all that... they are in the logs and whatnot above... But to make it simple, it's an RTX 3070 8gb, Windows 11 Professional, 32gb sysram, r7 5700X @ 4.8ghz, dev version 59cd08f5 now.
Here's a snippet of my latest log with the "relevant" bit. No noticeable errors that I can see. Full log also attached.
17:47:22-967784 INFO Applying hypertile: unet=448
17:47:22-993790 INFO XYZ grid start: images=135 grid=1 shape=27x5 cells=1 steps=1080
17:47:22-995792 DEBUG XYZ grid process: x=1/27 y=1/5 z=1/1 total=0.01
17:47:22-998792 DEBUG XYZ grid apply sampler: "UniPC"
17:47:22-999792 DEBUG XYZ grid apply field: steps=4
17:47:23-000792 INFO Applying hypertile: unet=448
Load network: D:\Stable Diffusion Files\Models\Loras\Microwaist_XL_v01.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/103.7 MB -:--:--
17:47:23-852588 DEBUG LoRA name="Microwaist_XL_v01" type={'ModuleTypeLora'} keys=788
17:47:24-201199 DEBUG GC: utilization={'gpu': 55, 'ram': 10, 'threshold': 25} gc={'collected': 27009, 'saved': 0.03} before={'gpu': 4.4, 'ram': 3.21} after={'gpu': 4.37, 'ram': 3.21, 'retries': 0, 'oom': 0}
device=cuda fn=activate:load_networks time=0.34
17:47:24-203201 INFO Load network: type=LoRA apply=['Microwaist_XL_v01'] te=[1.5] unet=[[1.5, 1.5, 1.5]] dims=[None] load=1.17
17:47:24-209202 INFO Base: class=StableDiffusionXLPipeline
17:47:24-210202 DEBUG Sampler: sampler="UniPC" class="UniPCMultistepScheduler config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear',
'prediction_type': 'epsilon', 'predict_x0': True, 'sample_max_value': 1.0, 'solver_order': 2, 'solver_type': 'bh2', 'thresholding': False, 'use_beta_sigmas': False,
'use_exponential_sigmas': False, 'use_karras_sigmas': False, 'lower_order_final': False, 'timestep_spacing': 'leading', 'final_sigmas_type': 'zero', 'rescale_betas_zero_snr': True}
17:47:24-536289 DEBUG GC: utilization={'gpu': 55, 'ram': 10, 'threshold': 25} gc={'collected': 127, 'saved': 0.0} before={'gpu': 4.37, 'ram': 3.21} after={'gpu': 4.37, 'ram': 3.21, 'retries': 0, 'oom': 0}
device=cuda fn=__init__:prepare_model time=0.32
17:47:25-684725 DEBUG GC: utilization={'gpu': 63, 'ram': 11, 'threshold': 25} gc={'collected': 2737, 'saved': 0.03} before={'gpu': 5.06, 'ram': 3.35} after={'gpu': 5.03, 'ram': 3.35, 'retries': 0, 'oom': 0}
device=cuda fn=encode:prepare_model time=0.3
17:47:25-687726 DEBUG Torch generator: device=cuda seeds=[1969483135]
17:47:25-688727 DEBUG Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE batch=1/1x1 set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1,
1280]), 'negative_prompt_embeds': torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 3, 'num_inference_steps': 4, 'eta': 1.0,
'guidance_rescale': 0.7, 'denoising_end': None, 'output_type': 'latent', 'width': 896, 'height': 1024, 'parser': 'native'}
Progress 2.01s/it ███████████████████████████████████ 100% 4/4 00:08 00:00 Base
17:47:34-160425 DEBUG GC: utilization={'gpu': 67, 'ram': 10, 'threshold': 25} gc={'collected': 248, 'saved': 0.91} before={'gpu': 5.34, 'ram': 3.24} after={'gpu': 4.43, 'ram': 3.24, 'retries': 0, 'oom': 0}
device=cuda fn=process_base:nextjob time=0.31
17:47:34-162425 DEBUG Init hires: upscaler="ESRGAN 4x Ultrasharp" sampler="DPM++ 3M" resize=1523x1740 upscale=1523x1740
17:47:34-163425 INFO Upscale: mode=1 upscaler="ESRGAN 4x Ultrasharp" context="Add with forward" resize=1523x1740 upscale=1523x1740
17:47:35-221920 DEBUG VAE decode: vae name="default" dtype=torch.bfloat16 device=cuda:0 upcast=False slicing=True tiling=True latents shape=torch.Size([1, 4, 128, 112]) dtype=torch.bfloat16 device=cuda:0
time=1.057
17:47:35-593443 DEBUG GC: utilization={'gpu': 52, 'ram': 19, 'threshold': 25} gc={'collected': 127, 'saved': 2.12} before={'gpu': 4.12, 'ram': 5.93} after={'gpu': 2.0, 'ram': 5.93, 'retries': 0, 'oom': 0}
device=cuda fn=resize_hires:vae_decode time=0.35
17:47:35-896024 DEBUG GC: utilization={'gpu': 25, 'ram': 19, 'threshold': 25} gc={'collected': 127, 'saved': 0.0} before={'gpu': 2.0, 'ram': 5.93} after={'gpu': 2.0, 'ram': 5.93, 'retries': 0, 'oom': 0}
device=cuda fn=upscale:begin time=0.3
17:47:35-947098 INFO Upscaler loaded: type=ESRGAN model=D:\Stable Diffusion Files\Models\ESRGAN\ESRGAN-UltraSharp-4x.pth
Upscaling ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:07
17:47:43-222081 DEBUG Upscaler unloaded: type=ESRGAN model=D:\Stable Diffusion Files\Models\ESRGAN\ESRGAN-UltraSharp-4x.pth
17:47:43-566105 DEBUG GC: utilization={'gpu': 30, 'ram': 19, 'threshold': 25} gc={'collected': 454, 'saved': 0.37} before={'gpu': 2.37, 'ram': 6.0} after={'gpu': 2.0, 'ram': 6.0, 'retries': 0, 'oom': 0}
device=cuda fn=upscale:do_upscale time=0.34
17:47:44-006723 DEBUG GC: utilization={'gpu': 25, 'ram': 19, 'threshold': 25} gc={'collected': 162, 'saved': 0.0} before={'gpu': 2.0, 'ram': 5.96} after={'gpu': 2.0, 'ram': 5.96, 'retries': 0, 'oom': 0}
device=cuda fn=upscale:end time=0.32
17:47:44-008724 DEBUG Image resize: input=<PIL.Image.Image image mode=RGB size=896x1024 at 0x22693193F50> width=1523 height=1740 mode="Fixed" upscaler="ESRGAN 4x Ultrasharp" context="Add with forward"
type=image result=<PIL.Image.Image image mode=RGB size=1523x1740 at 0x226CF436BD0> time=8.41 fn=process_hires:resize_hires
17:47:44-330322 DEBUG GC: utilization={'gpu': 25, 'ram': 19, 'threshold': 25} gc={'collected': 127, 'saved': 0.0} before={'gpu': 2.0, 'ram': 5.96} after={'gpu': 2.0, 'ram': 5.96, 'retries': 0, 'oom': 0}
device=cuda fn=process_hires:resize_hires time=0.32
17:47:44-753000 DEBUG GC: utilization={'gpu': 34, 'ram': 19, 'threshold': 25} gc={'collected': 162, 'saved': 0.75} before={'gpu': 2.75, 'ram': 5.96} after={'gpu': 2.0, 'ram': 5.96, 'retries': 0, 'oom': 0}
device=cuda fn=process_hires:nextjob time=0.33
0: 640x576 (no detections), 247.1ms
Speed: 2.0ms preprocess, 247.1ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 576)
[-] ADetailer: nothing detected on image 1 with 1st settings.
0: 640x576 (no detections), 186.1ms
Speed: 2.0ms preprocess, 186.1ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 576)
[-] ADetailer: nothing detected on image 1 with 2nd settings.
W0000 00:00:1731808065.771732 29784 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1731808065.776372 23816 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
[-] ADetailer: nothing detected on image 1 with 3rd settings.
0: 640x576 (no detections), 528.4ms
Speed: 3.0ms preprocess, 528.4ms inference, 0.0ms postprocess per image at shape (1, 3, 640, 576)
[-] ADetailer: nothing detected on image 1 with 4th settings.
17:47:46-940206 INFO Save: image="D:\Stable Diffusion Files\Outputs\text\12826-mklanRealistic_mklanRealxlV1HSD-a full body photorealistic photograph of a young.png" type=PNG width=1523 height=1740
size=10756
17:47:47-283735 DEBUG GC: utilization={'gpu': 25, 'ram': 21, 'threshold': 25} gc={'collected': 8835, 'saved': 0.0} before={'gpu': 2.0, 'ram': 6.57} after={'gpu': 2.0, 'ram': 6.57, 'retries': 0, 'oom': 0}
device=cuda fn=process_images:process_images_inner time=0.34
17:47:47-307741 INFO Processed: images=1 its=0.16 time=24.28 timers={'gc': 3.57, 'init': 1.2, 'encode': 1.47, 'args': 1.49, 'move': 0.02, 'pipeline': 8.03, 'hires': 11.0, 'post': 2.55} memory={'ram':
{'used': 6.57, 'total': 31.9}, 'gpu': {'used': 2.0, 'total': 8.0}, 'retries': 0, 'oom': 0}
EDIT: FWIW, the black appears (via live preview) immediately after inference when VAE decoding starts. Something causing a problem with the decoding or something? VAE is set to Automatic, and I have both the "fixed" VAE and the distilled VAE in the VAE folder as options for it to pick. I might try using different VAEs in a future test, running a grid atm.
On SDXL, tested base, fixed, and "low memory" VAE, black image still appears at beginning of VAE decode.
i believe there is an issue, but its not a general one and its not something i can reproduce. and without reproduction, i cannot fix it.
so, start with absolute minimal reproduction.
- does this happen with any sdxl model or specific ones only?
- is upscaling or hires relevant to reproduction? if not, remove it as it only adds complexities without adding value to troubleshooting.
- are you running using fp16 or bf16? did you try bf 16 (which is now default for all compatible gpus). etc...
I'm not sure I know enough to even be able to do a bare minimum workflow, but here goes:
- All SDXL models tested (at least a dozen different ones, some of which I no longer have) do not work with UniPC.
- Black screen appears during initial generation, before anything else. Hit generate, inference, at beginning of VAE decode black image appears. Upscaling, hires fix, extensions, scripts: all irrelevant, bug occurs before those are reached in the workflow.
- Not 100% sure which precision type is being used, I just looked in my settings and it is set to "auto". So presumably BF16? Preferred model variant and preferred VAE variant are both set to "default".
- Quantization is currently enabled, but black image occurred without quantization as well. NNCF.
- Hypertile enabled or disabled (UNET only, I never use hypertile vae)
- using --medvram, haven't tried --lowvram. --medvram is required for me to run SDXL, OOM otherwise
I can't really think of what else I can do to "bare minimum" on my GPU, if I turn anything else off (or on?), I'll start having issues running SDXL at all on my GPU. Again, only 8gb VRAM. Many of the settings I'm using are explicitly because without them I will get OOM or massively degraded performance.
Not 100% sure which precision type is being used, I just looked in my settings and it is set to "auto"
from your log:
2024-11-16 11:59:02,387 | sd | INFO | devices | Torch parameters: backend=cuda device=cuda config=Auto dtype=torch.bfloat16 vae=torch.bfloat16 unet=torch.bfloat16 context=no_grad nohalf=False nohalfvae=False upscast=False deterministic=True test-fp16=True test-bf16=True optimization="Scaled-Dot-Product"
Quantization is currently enabled, but black image occurred without quantization as well. NNCF.
i believe you, but please try to understand my point of view - i need a clean log. if i see hypertile or nncf or detailer or hires in the log and they have no relevance on the issue, it just makes any kind of analysis that much harder. so once again, please reproduce without anything that is not relevant - just to have as simple as possible log. if you need medvram, thats fine. i never said, disable everything - i said disable everything that is not relevant and/or needed.
revisiting old issues - is this still happening?
Sort of. Just did a test, please note that this is an older version of dev branch (due to ROCm for Windows, I'm trying to only update when I see an update for ROCm etc, trying to keep my install somewhat stable-ish), commit 5db54ffb.
Initially, the image during inference looks something like this:
After VAE decoding, it is a black image:
Here is the log:
Parameters: 1024x1536 HiDiffusion (no difference with this disabled) Hypertile (no difference with this disabled) CFG 4.0, default sampler parameters No refine, no scripts.
I am now on an RX 9070 XT using ROCm for Windows, but the behavior I am experiencing with UniPC seems to be identical as when I was using an RTX 3070.
ah, still happening and not any closer to figuring out why is unipc not bf16 safe in your case.