stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

[Bug] Qwen-Image, Strix Halo gfx1151 result in black image

Open linuzel opened this issue 1 month ago • 7 comments

Git commit

694f0d923578262c4f12ae93ded8e56c116085fe

Operating System & Version

Debian GNU/Linux 13 (trixie)

GGML backends

HIP

Command-line arguments used

sd --diffusion-model Qwen_Image-Q5_0.gguf --vae Qwen_Image-VAE.safetensors --qwen2vl Qwen2.5-VL-7B-Instruct-Q4_0.gguf -p "A fabulous flying cat in the style of a modern animated movie" --steps 8 --diffusion-fa --cfg-scale 2.5 -o /tmp/out.png -v -W 768 -H 512

Steps to reproduce

I followed the guide for Qwen and tried a lot of different combinations.

Flux schnell works properly with the same build.

What you expected to happen

A picture to be generated

What actually happened

It generates only a completely black image. Qwen-Image-Edit also fails, it only distord the input image and oversaturated but does not do anything else.

Logs / error messages / stack trace

Option: n_threads: 16 mode: img_gen model_path:
wtype: unspecified clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
qwen2vl_path: /mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf qwen2vl_vision_path:
diffusion_model_path: /mnt/encrypted/models/Qwen_Image-Q5_0.gguf high_noise_diffusion_model_path:
vae_path: /mnt/encrypted/models/Qwen_Image-VAE.safetensors taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00 output_path: /tmp/out.png init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths: control_video_path:
auto_resize_ref_image: true increase_ref_index: false offload_params_to_cpu: false clip_on_cpu: false control_net_cpu: false vae_on_cpu: false diffusion flash attention: true diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: A fabulous flying cat in the style of a modern animated movie negative_prompt:
clip_skip: -1 width: 768 height: 512 sample_params: (txt_cfg: 2.50, img_cfg: 2.50, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 8, eta: 0.00, shifted_timestep: 0) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0) moe_boundary: 0.875 prediction: default flow_shift: inf strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false force_sdxl_vae_conv_scale: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 vace_strength: 1.00 fps: 16 preview_mode: none (denoised) preview_interval: 1 System Info: SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 1 AVX512_VBMI = 1 AVX512_VNNI = 1 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:150 - Using CUDA backend [INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no [INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no [INFO ] ggml_extend.hpp:69 - ggml_cuda_init: found 1 ROCm devices: [INFO ] ggml_extend.hpp:69 - Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32 [INFO ] stable-diffusion.cpp:214 - loading diffusion model from '/mnt/encrypted/models/Qwen_Image-Q5_0.gguf' [INFO ] model.cpp:376 - load /mnt/encrypted/models/Qwen_Image-Q5_0.gguf using gguf format [DEBUG] model.cpp:418 - init from '/mnt/encrypted/models/Qwen_Image-Q5_0.gguf' [INFO ] stable-diffusion.cpp:261 - loading qwen2vl from '/mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf' [INFO ] model.cpp:376 - load /mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf using gguf format [DEBUG] model.cpp:418 - init from '/mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf' [INFO ] stable-diffusion.cpp:275 - loading vae from '/mnt/encrypted/models/Qwen_Image-VAE.safetensors' [INFO ] model.cpp:379 - load /mnt/encrypted/models/Qwen_Image-VAE.safetensors using safetensors format [DEBUG] model.cpp:509 - init from '/mnt/encrypted/models/Qwen_Image-VAE.safetensors', prefix = 'vae.' [INFO ] stable-diffusion.cpp:298 - Version: Qwen Image [INFO ] stable-diffusion.cpp:325 - Weight type stat: f32: 1422 | q4_0: 194 | q4_1: 3 | q5_0: 720 | q5_1: 120 | bf16: 6
[INFO ] stable-diffusion.cpp:326 - Conditioner weight type stat: f32: 141 | q4_0: 194 | q4_1: 3
[INFO ] stable-diffusion.cpp:327 - Diffusion model weight type stat: f32: 1087 | q5_0: 720 | q5_1: 120 | bf16: 6
[INFO ] stable-diffusion.cpp:328 - VAE weight type stat: f32: 194
[DEBUG] stable-diffusion.cpp:330 - ggml tensor size = 400 bytes [DEBUG] qwenvl.hpp:141 - merges size 151387 [DEBUG] qwenvl.hpp:163 - vocab size: 151665 [INFO ] qwen_image.hpp:527 - qwen_image_params.num_layers: 60 [INFO ] stable-diffusion.cpp:464 - Using flash attention in the diffusion model [DEBUG] ggml_extend.hpp:1785 - qwenvl2.5 params backend buffer size = 3806.21 MB(VRAM) (338 tensors) [DEBUG] ggml_extend.hpp:1785 - qwen_image params backend buffer size = 13733.54 MB(VRAM) (1933 tensors) [DEBUG] ggml_extend.hpp:1785 - wan_vae params backend buffer size = 139.84 MB(VRAM) (108 tensors) [DEBUG] stable-diffusion.cpp:606 - loading weights [DEBUG] model.cpp:1297 - using 16 threads for model loading [DEBUG] model.cpp:1319 - loading tensors from /mnt/encrypted/models/Qwen_Image-Q5_0.gguf |=======================================> | 1933/2465 - 343.71it/s [DEBUG] model.cpp:1319 - loading tensors from /mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf |==============================================> | 2271/2465 - 388.34it/s [DEBUG] model.cpp:1319 - loading tensors from /mnt/encrypted/models/Qwen_Image-VAE.safetensors |==================================================| 2465/2465 - 407.30it/s [INFO ] model.cpp:1528 - loading tensors completed, taking 6.05s (process: 0.00s, read: 5.23s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.47s) [INFO ] stable-diffusion.cpp:705 - total params memory size = 17679.59MB (VRAM 17679.59MB, RAM 0.00MB): text_encoders 3806.21MB(VRAM), diffusion_model 13733.55MB(VRAM), vae 139.84MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM) [INFO ] stable-diffusion.cpp:791 - running in FLOW mode [DEBUG] stable-diffusion.cpp:816 - finished loaded file [DEBUG] stable-diffusion.cpp:2714 - generate_image 768x512 [INFO ] stable-diffusion.cpp:2850 - TXT2IMG [INFO ] stable-diffusion.cpp:963 - attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:983 - apply_loras completed, taking 0.00s [DEBUG] stable-diffusion.cpp:984 - prompt after extract and remove lora: "A fabulous flying cat in the style of a modern animated movie" [DEBUG] conditioner.hpp:1614 - parse '<|im_start|>system Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|> <|im_start|>user A fabulous flying cat in the style of a modern animated movie<|im_end|> <|im_start|>assistant ' to [['<|im_start|>system Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|> <|im_start|>user A fabulous flying cat in the style of a modern animated movie<|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] ggml_extend.hpp:1600 - qwenvl2.5 compute buffer size: 9.46 MB(VRAM) [DEBUG] conditioner.hpp:1754 - computing condition graph completed, taking 112 ms [DEBUG] conditioner.hpp:1614 - parse '<|im_start|>system Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|> <|im_start|>user <|im_end|> <|im_start|>assistant ' to [['<|im_start|>system Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|> <|im_start|>user <|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] ggml_extend.hpp:1600 - qwenvl2.5 compute buffer size: 7.24 MB(VRAM) [DEBUG] conditioner.hpp:1754 - computing condition graph completed, taking 61 ms [INFO ] stable-diffusion.cpp:2499 - get_learned_condition completed, taking 175 ms [INFO ] stable-diffusion.cpp:2517 - sampling using Euler method [INFO ] stable-diffusion.cpp:2611 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:1600 - qwen_image compute buffer size: 200.70 MB(VRAM) |==================================================| 8/8 - 4.82s/it [INFO ] stable-diffusion.cpp:2648 - sampling completed, taking 38.54s [INFO ] stable-diffusion.cpp:2656 - generating 1 latent images completed, taking 38.58s [INFO ] stable-diffusion.cpp:2659 - decoding 1 latents [DEBUG] ggml_extend.hpp:1600 - wan_vae compute buffer size: 2811.19 MB(VRAM) [DEBUG] stable-diffusion.cpp:1928 - computing vae decode graph completed, taking 1.93s [INFO ] stable-diffusion.cpp:2669 - latent 1 decoded, taking 1.93s [INFO ] stable-diffusion.cpp:2673 - decode_first_stage completed, taking 1.93s [INFO ] stable-diffusion.cpp:2962 - generate_image completed in 40.69s save result PNG image to '/tmp/out.png'

Additional context / environment details

It works with Vulkan backend but it seems to be much slower.

linuzel avatar Nov 11 '25 13:11 linuzel

Can you run it with --preview proj and see if preview.png is also black from the start or if it becomes black at one point?

stduhpf avatar Nov 11 '25 13:11 stduhpf

Can you run it with --preview proj and see if preview.png is also black from the start or if it becomes black at one point?

Interesting result. The first previews are fine, then it turns almost black then black : Image Image Image Image Image Image Image Image

I ran a few more tests with various amount of steps. It always fail at the first half. For 12 steps, the first 5 are ok, the 6 is partially black then the rest are all blacks.

linuzel avatar Nov 11 '25 13:11 linuzel

Tried to reproduce it here (gfx1102 (RX 7600 XT) on Linux), and it's even worse: black image from start with any Qwen variant (Pruned, Edit, etc).

Same command line works on Vulkan, although Edit seems kind of oversaturated.

wbruna avatar Nov 12 '25 11:11 wbruna

Do you notice any test failures when running llama.cpp's test-backend-ops tool?

stduhpf avatar Nov 12 '25 13:11 stduhpf

No failures with my card: all tests either pass, or are not supported.

wbruna avatar Nov 13 '25 10:11 wbruna

I can confirm the black image on my env too (7900 XTX, gfx1100). I didn't test with Vulkan, but for ROCm all test-backend-ops pass or are not supported; Flux Schnell does produce an image correctly. Testing with Qwen-Image, the preview is black right from the start.

Nindaleth avatar Nov 16 '25 00:11 Nindaleth

Tried to reproduce it here (gfx1102 (RX 7600 XT) on Linux), and it's even worse: black image from start with any Qwen variant (Pruned, Edit, etc).

I was able to get a similar behavior by changing the default scale on the Linear layers to 1/128, but I still get a black image creeping up at the 5th step (even with a tiny value like 1/65536), so I believe there are NaN issues in more than one point.

wbruna avatar Dec 03 '25 00:12 wbruna