[BUG] ZImage + VULKAN create a blank image
Hello everyone, I'm using the last sd release with Vulkan backend I've tried it with an old command with stableDiffusion1.4 and it works well. But with ZImage, I get a blank image. Does anybody have an idea how to fix it ? Thanks in advance ! Olivier
Here is the command :
sd.exe --diffusion-model ..\ZImage\z_image_turbo-Q3_K.gguf --vae "..\Flux.1 Q4 F16\ae.safetensors" --llm ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v --offload-to-cpu -H 512 -W 512 -t 20 --steps 10 -s 123456
And here is the output :
`C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>sd.exe --diffusion-model ..\ZImage\z_image_turbo-Q3_K.gguf --vae "..\Flux.1 Q4 F16\ae.safetensors" --llm ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v --offload-to-cpu -H 512 -W 512 -t 20 --steps 10 -s 123456 --vae-on-cpu Option: n_threads: 20 mode: img_gen model_path: wtype: unspecified clip_l_path: clip_g_path: clip_vision_path: t5xxl_path: llm_path: ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf llm_vision_path: diffusion_model_path: ..\ZImage\z_image_turbo-Q3_K.gguf high_noise_diffusion_model_path: vae_path: ..\Flux.1 Q4 F16\ae.safetensors taesd_path: esrgan_path: control_net_path: embedding_dir: photo_maker_path: pm_id_images_dir: pm_id_embed_path: pm_style_strength: 20.00 output_path: output.png init_image_path: end_image_path: mask_image_path: control_image_path: ref_images_paths: control_video_path: auto_resize_ref_image: true increase_ref_index: false offload_params_to_cpu: true clip_on_cpu: false control_net_cpu: false vae_on_cpu: true diffusion flash attention: false diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic negative_prompt: clip_skip: -1 width: 512 height: 512 sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 10, eta: 0.00, shifted_timestep: 0) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: -1, eta: 0.00, shifted_timestep: 0) moe_boundary: 0.875 prediction: default lora_apply_mode: auto flow_shift: inf strength(img2img): 0.75 rng: cuda sampler rng: NONE seed: 123456 batch_count: 1 vae_tiling: false force_sdxl_vae_conv_scale: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 easycache: disabled (threshold=0.200, start=0.15, end=0.95) vace_strength: 1.00 fps: 16 preview_mode: none (denoised) preview_interval: 1 System Info: SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:167 - Using Vulkan backend [DEBUG] ggml_extend.hpp:66 - ggml_vulkan: Found 2 Vulkan devices: [DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none [DEBUG] ggml_extend.hpp:66 - ggml_vulkan: 1 = NVIDIA RTX A1000 6GB Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: none [INFO ] stable-diffusion.cpp:234 - loading diffusion model from '..\ZImage\z_image_turbo-Q3_K.gguf' [INFO ] model.cpp:378 - load ..\ZImage\z_image_turbo-Q3_K.gguf using gguf format [DEBUG] model.cpp:420 - init from '..\ZImage\z_image_turbo-Q3_K.gguf' [INFO ] stable-diffusion.cpp:281 - loading llm from '..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf' [INFO ] model.cpp:378 - load ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf using gguf format [DEBUG] model.cpp:420 - init from '..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf' [INFO ] stable-diffusion.cpp:295 - loading vae from '..\Flux.1 Q4 F16\ae.safetensors' [INFO ] model.cpp:381 - load ..\Flux.1 Q4 F16\ae.safetensors using safetensors format [DEBUG] model.cpp:511 - init from '..\Flux.1 Q4 F16\ae.safetensors', prefix = 'vae.' [INFO ] stable-diffusion.cpp:318 - Version: Z-Image [INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 640 | q8_0: 22 | q3_K: 324 | q4_K: 104 | q5_K: 4 | q6_K: 1 [INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 145 | q3_K: 144 | q4_K: 104 | q5_K: 4 | q6_K: 1 [INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q3_K: 180 [INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244 [DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes [DEBUG] llm.hpp:285 - merges size 151387 [DEBUG] llm.hpp:317 - vocab size: 151665 [DEBUG] ggml_extend.hpp:1877 - qwen3 params backend buffer size = 3153.25 MB(RAM) (398 tensors) [DEBUG] ggml_extend.hpp:1877 - z_image params backend buffer size = 2997.90 MB(RAM) (453 tensors) [INFO ] stable-diffusion.cpp:555 - VAE Autoencoder: Using CPU backend [DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:683 - loading weights [DEBUG] model.cpp:1359 - using 20 threads for model loading [DEBUG] model.cpp:1381 - loading tensors from ..\ZImage\z_image_turbo-Q3_K.gguf |====================> | 453/1095 - 556.51it/s←[K [DEBUG] model.cpp:1381 - loading tensors from ..\ZImage\Qwen3-4B-Instruct-2507-Q3_K_M.gguf |======================================> | 851/1095 - 418.39it/s←[K [DEBUG] model.cpp:1381 - loading tensors from ..\Flux.1 Q4 F16\ae.safetensors |==================================================| 1095/1095 - 486.67it/s←[K [INFO ] model.cpp:1590 - loading tensors completed, taking 2.25s (process: 0.00s, read: 1.19s, memcpy: 0.00s, convert: 0.03s, copy_to_backend: 0.00s) [INFO ] stable-diffusion.cpp:782 - total params memory size = 6245.72MB (VRAM 6151.15MB, RAM 94.57MB): text_encoders 3153.25MB(VRAM), diffusion_model 2997.90MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM) [INFO ] stable-diffusion.cpp:883 - running in FLOW mode [DEBUG] stable-diffusion.cpp:908 - finished loaded file [DEBUG] stable-diffusion.cpp:3138 - generate_image 512x512 [INFO ] stable-diffusion.cpp:3169 - sampling using Euler method [INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler [INFO ] stable-diffusion.cpp:3282 - TXT2IMG [INFO ] stable-diffusion.cpp:1167 - apply at runtime [DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic<|im_end|> <|im_start|>assistant ' to [['<|im_start|>user ', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic', 1], ['<|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] llm.hpp:259 - split prompt "<|im_start|>user " to tokens ["<|im_start|>", "user", "─è", ] [DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" to tokens ["A", "─ácinematic", ",", "─ámelanch", "olic", "─áphotograph", "─áof", "─áa", "─ásolitary", "─áhood", "ed", "─áfigure", "─áwalking", "─áthrough", "─áa", "─ásprawling", ",", "─árain", "-s", "lick", "ed", "─ámet", "ropolis", "─áat", "─ánight", ".", "─áThe", "─ácity", "─álights", "─áare", "─áa", "─áchaotic", "─áblur", "─áof", "─áneon", "─áorange", "─áand", "─ácool", "─áblue", ",", "─áreflecting", "─áon", "─áthe", "─áwet", "─áasphalt", ".", "─áThe", "─áscene", "─áev", "okes", "─áa", "─ásense", "─áof", "─ábeing", "─áa", "─ásingle", "─ácomponent", "─áin", "─áa", "─ávast", "─ámachine", ".", "─áSuper", "im", "posed", "─áover", "─áthe", "─áimage", "─áin", "─áa", "─ásleek", ",", "─ámodern", ",", "─áslightly", "─áglitch", "ed", "─áfont", "─áis", "─áthe", "─áphilosophical", "─áquote", ":", "─á'", "THE", "─áCITY", "─áIS", "─áA", "─áC", "IR", "CU", "IT", "─áBOARD", ",", "─áAND", "─áI", "─áAM", "─áA", "─áBRO", "KEN", "─áTRANS", "IST", "OR", ".'", "─á--", "─ámo", "ody", ",", "─áatmospheric", ",", "─áprofound", ",", "─ádark", "─áacademic", ] [DEBUG] llm.hpp:259 - split prompt "<|im_end|> <|im_start|>assistant " to tokens ["<|im_end|>", "─è", "<|im_start|>", "assistant", "─è", ] [INFO ] ggml_extend.hpp:1791 - qwen3 offload params (3153.25 MB, 398 tensors) to runtime backend (Vulkan1), taking 2.45s [DEBUG] ggml_extend.hpp:1691 - qwen3 compute buffer size: 13.34 MB(VRAM) [DEBUG] conditioner.hpp:1896 - computing condition graph completed, taking 3508 ms [INFO ] stable-diffusion.cpp:2917 - get_learned_condition completed, taking 3557 ms [INFO ] stable-diffusion.cpp:3028 - generating image: 1/1 - seed 123456 [INFO ] ggml_extend.hpp:1791 - z_image offload params (2997.90 MB, 453 tensors) to runtime backend (Vulkan1), taking 1.13s [DEBUG] ggml_extend.hpp:1691 - z_image compute buffer size: 255.60 MB(VRAM) |==================================================| 10/10 - 6.12s/it←[K [INFO ] stable-diffusion.cpp:3069 - sampling completed, taking 61.45s [INFO ] stable-diffusion.cpp:3077 - generating 1 latent images completed, taking 61.93s [INFO ] stable-diffusion.cpp:3080 - decoding 1 latents [DEBUG] ggml_extend.hpp:1691 - vae compute buffer size: 1664.00 MB(RAM) [DEBUG] stable-diffusion.cpp:2286 - computing vae decode graph completed, taking 11.13s [INFO ] stable-diffusion.cpp:3090 - latent 1 decoded, taking 11.13s [INFO ] stable-diffusion.cpp:3094 - decode_first_stage completed, taking 11.13s [INFO ] stable-diffusion.cpp:3390 - generate_image completed in 76.66s save result PNG image to 'output.png' (success)
C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>REM --vae-conv-direct --diffusion-conv-direct
C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>REM --diffusion-fa
C:\Users\braultoli\Desktop\sd-master-9578fdc-bin-win-avx2-x64\sd VULKAN 2025-12-01>PAUSE`
I have mostly the same output. I used preview to, and in the first 5 or 6 steps were black dot-dot-dot garbage, and at the 6th or 7th step it changed to red dot-dot. (It was on Ubuntu 24.04)
I got the same blank, but it looks like it only on intel iGPU, AMD iGPU or nvidia GPU do not have this problem, so try this to run on GPU: https://github.com/leejet/stable-diffusion.cpp/issues/650#issuecomment-2781463710
I've tried both the GPU from the CPU and the NVIDIA GPU, it is the same : on the CPU the image is black and on the GPU the image is colored (pink or something like that), or the opposite, but I have always the problem
This could be caused by a badly quantized file. Are you able to compare this results with Q4 quants? With --offload-to-cpu, your 6G card should be able to handle it.
I was having the same issue, but while doing my due dilingence to add to these reports, to my pleasant surprise I found out that the Q4_K_M quant from jay7's HF worked while everything I tried from leejet's HF didn't.
I was thinking there's no way it's a quant issue, but apparently it is, at least to some degree. Using a CPU-only build I can use the leejet quants just fine (though I've only done 3-step images since the speed is glacial), but on Vulkan the output is this light yellow with some other colors right at the edges. With --vae-conv-direct, the output is completely black.
What works for me:
- quants made available by jayn7 here: https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/tree/main So far I've tried Q4_K_M and Q6_K, and both work with Vulkan.
- the official vae: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo/blob/main/vae/diffusion_pytorch_model.safetensors
- Qwen3-4B-Instruct-2507-Q8_0.gguf from Unsloth: https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q8_0.gguf
Using the following command:
sd --diffusion-model jayn7-z_image_turbo-Q4_K_M.gguf --vae zimg-diffusion_pytorch_model.safetensors --llm Qwen3-4B-Instruct-2507-Q8_0.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 1.0 -v -H 1024 -W 512 --steps 10 --offload-to-cpu -v --output zimg_q4km.png
The output:
However, with the exact same command but using leejet's Q4_0 quant from here, the output is as such:
Here's the terminal output of the command that produces the bad image: blank4.txt
The system is running Linux Mint 22.1 and an RX 6600 with the default FOSS drivers.
I’m unable to reproduce the issue above. It might be related to differences in the Vulkan driver implementations on specific platforms. During inference with z-image and qwen-image, extremely large values can be generated, which makes NaN errors more likely and can result in black images.
Interesting. I was suspecting something related to the lower quants, but at least q4_0 should work. I can't check leejet's quants right now, but at least my own q4_0 works for me (https://huggingface.co/wbruna/Z-Image-Turbo-sdcpp-GGUF , on amdgpu with radv).
Changing the default scale parameter of Linear from 1.f to 1.f/256.f might be a temporary solution.
diff --git a/ggml_extend.hpp b/ggml_extend.hpp
index 92dd3b8b..23112d6c 100644
--- a/ggml_extend.hpp
+++ b/ggml_extend.hpp
@@ -2134,7 +2134,7 @@ public:
bool bias = true,
bool force_f32 = false,
bool force_prec_f32 = false,
- float scale = 1.f)
+ float scale = 1.f / 256.f)
: in_features(in_features),
out_features(out_features),
bias(bias),
This is the command I used to generate the q4_0 weights.
.\bin\Release\sd.exe -M convert -m z_image_turbo_bf16.safetensors -o z_image_turbo-Q4_0.gguf --tensor-type-rules "^layers.*adaLN_modulation.*weight=q4_0,layers.*attention.out.*weight=q4_0,layers.*attention.qkv.*weight=q4_0,layers.*feed_forward.*weight=q4_0,context_refiner.*attention.out.*weight=q8_0,context_refiner.*attention.qkv.*weight=q8_0,context_refiner.*feed_forward.*weight=q8_0,noise_refiner.*adaLN_modulation.*weight=q4_0,noise_refiner.*attention.out.*weight=q4_0,noise_refiner.*attention.qkv.*weight=q4_0,noise_refiner.*feed_forward.*weight=q4_0" -v
This is the command I used to generate the q4_0 weights.
I confirm this quantization also works for me, both on Vukan+radv and ROCm (and as expected, with better quality than my direct q4_0 conv).
Hi, with quantized files from https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/tree/main it works fine for me ! But the gguf files are much bigger than those proposed by leejet. (+1 GB) Is there a good way to create the quantized files without this error and with the same size than leejet ? Thanks in advance.
amdgpu radv / vulkan / deb (old drivers not current) The Jayn7 Q4_K_M works here, used for tests earlier.
Taking bf16 safetensors and using sd convert to q8_0 works.
Taking bf16 safetensors and converted as leejet q4_0 locally, runs fine here.
olivbrau, you could grab the original bf16 safetensors file and convert your own using sd, as shown above, trying different parameters. RAM not VRAM for this.
RDNA2 RX6600M on Win11, vulkan exe from yesterday, confirming leejet 4_0 quant creates solid color images, jayn7 4_K_S works, same as olivbrau