[Bug] Qwen-Image, Strix Halo gfx1151 result in black image
Git commit
694f0d923578262c4f12ae93ded8e56c116085fe
Operating System & Version
Debian GNU/Linux 13 (trixie)
GGML backends
HIP
Command-line arguments used
sd --diffusion-model Qwen_Image-Q5_0.gguf --vae Qwen_Image-VAE.safetensors --qwen2vl Qwen2.5-VL-7B-Instruct-Q4_0.gguf -p "A fabulous flying cat in the style of a modern animated movie" --steps 8 --diffusion-fa --cfg-scale 2.5 -o /tmp/out.png -v -W 768 -H 512
Steps to reproduce
I followed the guide for Qwen and tried a lot of different combinations.
Flux schnell works properly with the same build.
What you expected to happen
A picture to be generated
What actually happened
It generates only a completely black image. Qwen-Image-Edit also fails, it only distord the input image and oversaturated but does not do anything else.
Logs / error messages / stack trace
Option:
n_threads: 16
mode: img_gen
model_path:
wtype: unspecified
clip_l_path:
clip_g_path:
clip_vision_path:
t5xxl_path:
qwen2vl_path: /mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf
qwen2vl_vision_path:
diffusion_model_path: /mnt/encrypted/models/Qwen_Image-Q5_0.gguf
high_noise_diffusion_model_path:
vae_path: /mnt/encrypted/models/Qwen_Image-VAE.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: /tmp/out.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
control_video_path:
auto_resize_ref_image: true
increase_ref_index: false
offload_params_to_cpu: false
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: true
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: A fabulous flying cat in the style of a modern animated movie
negative_prompt:
clip_skip: -1
width: 768
height: 512
sample_params: (txt_cfg: 2.50, img_cfg: 2.50, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: 8, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
prediction: default
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: false
force_sdxl_vae_conv_scale: false
upscale_repeats: 1
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
preview_mode: none (denoised)
preview_interval: 1
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 1
AVX512_VBMI = 1
AVX512_VNNI = 1
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:150 - Using CUDA backend
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: found 1 ROCm devices:
[INFO ] ggml_extend.hpp:69 - Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
[INFO ] stable-diffusion.cpp:214 - loading diffusion model from '/mnt/encrypted/models/Qwen_Image-Q5_0.gguf'
[INFO ] model.cpp:376 - load /mnt/encrypted/models/Qwen_Image-Q5_0.gguf using gguf format
[DEBUG] model.cpp:418 - init from '/mnt/encrypted/models/Qwen_Image-Q5_0.gguf'
[INFO ] stable-diffusion.cpp:261 - loading qwen2vl from '/mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf'
[INFO ] model.cpp:376 - load /mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf using gguf format
[DEBUG] model.cpp:418 - init from '/mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf'
[INFO ] stable-diffusion.cpp:275 - loading vae from '/mnt/encrypted/models/Qwen_Image-VAE.safetensors'
[INFO ] model.cpp:379 - load /mnt/encrypted/models/Qwen_Image-VAE.safetensors using safetensors format
[DEBUG] model.cpp:509 - init from '/mnt/encrypted/models/Qwen_Image-VAE.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:298 - Version: Qwen Image
[INFO ] stable-diffusion.cpp:325 - Weight type stat: f32: 1422 | q4_0: 194 | q4_1: 3 | q5_0: 720 | q5_1: 120 | bf16: 6
[INFO ] stable-diffusion.cpp:326 - Conditioner weight type stat: f32: 141 | q4_0: 194 | q4_1: 3
[INFO ] stable-diffusion.cpp:327 - Diffusion model weight type stat: f32: 1087 | q5_0: 720 | q5_1: 120 | bf16: 6
[INFO ] stable-diffusion.cpp:328 - VAE weight type stat: f32: 194
[DEBUG] stable-diffusion.cpp:330 - ggml tensor size = 400 bytes
[DEBUG] qwenvl.hpp:141 - merges size 151387
[DEBUG] qwenvl.hpp:163 - vocab size: 151665
[INFO ] qwen_image.hpp:527 - qwen_image_params.num_layers: 60
[INFO ] stable-diffusion.cpp:464 - Using flash attention in the diffusion model
[DEBUG] ggml_extend.hpp:1785 - qwenvl2.5 params backend buffer size = 3806.21 MB(VRAM) (338 tensors)
[DEBUG] ggml_extend.hpp:1785 - qwen_image params backend buffer size = 13733.54 MB(VRAM) (1933 tensors)
[DEBUG] ggml_extend.hpp:1785 - wan_vae params backend buffer size = 139.84 MB(VRAM) (108 tensors)
[DEBUG] stable-diffusion.cpp:606 - loading weights
[DEBUG] model.cpp:1297 - using 16 threads for model loading
[DEBUG] model.cpp:1319 - loading tensors from /mnt/encrypted/models/Qwen_Image-Q5_0.gguf
|=======================================> | 1933/2465 - 343.71it/s
[DEBUG] model.cpp:1319 - loading tensors from /mnt/encrypted/models/Qwen2.5-VL-7B-Instruct-Q4_0.gguf
|==============================================> | 2271/2465 - 388.34it/s
[DEBUG] model.cpp:1319 - loading tensors from /mnt/encrypted/models/Qwen_Image-VAE.safetensors
|==================================================| 2465/2465 - 407.30it/s
[INFO ] model.cpp:1528 - loading tensors completed, taking 6.05s (process: 0.00s, read: 5.23s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.47s)
[INFO ] stable-diffusion.cpp:705 - total params memory size = 17679.59MB (VRAM 17679.59MB, RAM 0.00MB): text_encoders 3806.21MB(VRAM), diffusion_model 13733.55MB(VRAM), vae 139.84MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:791 - running in FLOW mode
[DEBUG] stable-diffusion.cpp:816 - finished loaded file
[DEBUG] stable-diffusion.cpp:2714 - generate_image 768x512
[INFO ] stable-diffusion.cpp:2850 - TXT2IMG
[INFO ] stable-diffusion.cpp:963 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:983 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:984 - prompt after extract and remove lora: "A fabulous flying cat in the style of a modern animated movie"
[DEBUG] conditioner.hpp:1614 - parse '<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
A fabulous flying cat in the style of a modern animated movie<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
A fabulous flying cat in the style of a modern animated movie<|im_end|>
<|im_start|>assistant
', 1], ]
[DEBUG] ggml_extend.hpp:1600 - qwenvl2.5 compute buffer size: 9.46 MB(VRAM)
[DEBUG] conditioner.hpp:1754 - computing condition graph completed, taking 112 ms
[DEBUG] conditioner.hpp:1614 - parse '<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
<|im_end|>
<|im_start|>assistant
' to [['<|im_start|>system
Describe the image by detailing the color, shape, size, texture, quantity, text, spatial relationships of the objects and background:<|im_end|>
<|im_start|>user
<|im_end|>
<|im_start|>assistant
', 1], ]
[DEBUG] ggml_extend.hpp:1600 - qwenvl2.5 compute buffer size: 7.24 MB(VRAM)
[DEBUG] conditioner.hpp:1754 - computing condition graph completed, taking 61 ms
[INFO ] stable-diffusion.cpp:2499 - get_learned_condition completed, taking 175 ms
[INFO ] stable-diffusion.cpp:2517 - sampling using Euler method
[INFO ] stable-diffusion.cpp:2611 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1600 - qwen_image compute buffer size: 200.70 MB(VRAM)
|==================================================| 8/8 - 4.82s/it
[INFO ] stable-diffusion.cpp:2648 - sampling completed, taking 38.54s
[INFO ] stable-diffusion.cpp:2656 - generating 1 latent images completed, taking 38.58s
[INFO ] stable-diffusion.cpp:2659 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1600 - wan_vae compute buffer size: 2811.19 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1928 - computing vae decode graph completed, taking 1.93s
[INFO ] stable-diffusion.cpp:2669 - latent 1 decoded, taking 1.93s
[INFO ] stable-diffusion.cpp:2673 - decode_first_stage completed, taking 1.93s
[INFO ] stable-diffusion.cpp:2962 - generate_image completed in 40.69s
save result PNG image to '/tmp/out.png'
Additional context / environment details
It works with Vulkan backend but it seems to be much slower.
Can you run it with --preview proj and see if preview.png is also black from the start or if it becomes black at one point?
Can you run it with
--preview projand see if preview.png is also black from the start or if it becomes black at one point?
Interesting result.
The first previews are fine, then it turns almost black then black :
I ran a few more tests with various amount of steps. It always fail at the first half. For 12 steps, the first 5 are ok, the 6 is partially black then the rest are all blacks.
Tried to reproduce it here (gfx1102 (RX 7600 XT) on Linux), and it's even worse: black image from start with any Qwen variant (Pruned, Edit, etc).
Same command line works on Vulkan, although Edit seems kind of oversaturated.
Do you notice any test failures when running llama.cpp's test-backend-ops tool?
No failures with my card: all tests either pass, or are not supported.
I can confirm the black image on my env too (7900 XTX, gfx1100). I didn't test with Vulkan, but for ROCm all test-backend-ops pass or are not supported; Flux Schnell does produce an image correctly. Testing with Qwen-Image, the preview is black right from the start.
Tried to reproduce it here (gfx1102 (RX 7600 XT) on Linux), and it's even worse: black image from start with any Qwen variant (Pruned, Edit, etc).
I was able to get a similar behavior by changing the default scale on the Linear layers to 1/128, but I still get a black image creeping up at the 5th step (even with a tiny value like 1/65536), so I believe there are NaN issues in more than one point.