stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

Problem using VULKAN version ~8kb black image (sometimes not), maybe conditioner bug

Open phil2sat opened this issue 4 months ago • 8 comments

I dont know what it is tried almost every option, heres a good run: EDIT: I know exactly ehat it is...

while writing i tried several prompts, simple do work like "a cat", longer prompts with weights end up weird like:

..., ..., masterpiece, best quality, amazing quality, realistic, photorealistic, hyper-realistic, photo, high-resolution, 8K ultra-detailed RAW photograph, Fujifilm XT3, soft even lighting, realistic texture, natural shadows, high-end magazine photoshoot', 1], ]

will become something like this and generate nothing:

..., ..., masterpiece, best quality, amazing quality, realistic, photorealistic, hyper-realistic, photo, high-resolution, 8K ultra-detailed RAW photograph, Fujifilm XT3, soft even lighting, realistic texture, natural shadows, high-end magazine photoshoot qual <--- HERE IS SOMETHING WRONG --> ', 1], ] <-- MISSING

if i remove just the "quality" word i generates a picture as it should

next thing if i add a negative prompt like:

blurry, low resolution, low quality, bad anatomy, poorly drawn face, missing limbs, extra limbs, duplicate body parts, cropped, jpeg artifacts, bad hands, malformed feet, deformed pu**y, unrealistic proportions, watermark, text, sketch, cartoon, CGI render, anime, painting, 3D render, noise, bad composition, tanlines <-- from forge exactly 77 tokens

or

blurry, low resolution, low quality, bad anatomy, poorly drawn face, missing limbs, extra limbs, duplicate body parts, cropped, jpeg artifacts

it will also generates no image

while "blurry "alone works, weird...

at least with this knowledge i can generate images, but it lasted a week

thanks for you hard work and effort and its the only possibility to get stable diffusion running on my rig: 4790k Radeon R9 290 4GB / 16GB ram.

Image
sd -M img_gen -p "a cat" -n  --sampling-method ddim_trailing --steps 20 --schedule discrete -W 512 -H 512 -b 1 --cfg-scale 7 -s -1 --clip-skip -1 --embd-dir /home/phil2sat/sd.cpp-webui/models/embeddings/ --lora-model-dir /home/phil2sat/sd.cpp-webui/models/loras/ -t 0 --rng cuda -o /home/phil2sat/sd.cpp-webui/outputs/txt2img/14.png --model /home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors --vae /home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors --color --diffusion-fa --diffusion-conv-direct --vae-conv-direct -v


Option:
    n_threads:         4
    mode:              img_gen
    model_path:        /home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors
    wtype:             unspecified
    clip_l_path:
    clip_g_path:
    t5xxl_path:
    diffusion_model_path:
    vae_path:          /home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:   /home/phil2sat/sd.cpp-webui/models/embeddings/
    stacked_id_embed_dir:
    input_id_images_path:
    style ratio:       20.00
    normalize input image :  false
    output_path:       /home/phil2sat/sd.cpp-webui/outputs/txt2img/14.png
    init_img:
    mask_img:
    control_image:
    ref_images_paths:
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:true
    strength(control): 0.90
    prompt:            a cat
    negative_prompt:
    min_cfg:           1.00
    cfg_scale:         7.00
    img_cfg_scale:     7.00
    slg_scale:         0.00
    guidance:          3.50
    eta:               0.00
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     ddim_trailing
    schedule:          discrete
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              854311355
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
    chroma_use_dit_mask:   true
    chroma_use_t5_mask:    false
    chroma_t5_mask_pad:    1
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:145  - Using Vulkan backend
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon R9 200 Series (RADV HAWAII) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
[INFO ] stable-diffusion.cpp:192  - loading model from '/home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors'
[INFO ] model.cpp:1010 - load /home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors using gguf format
[DEBUG] model.cpp:1027 - init from '/home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors'
[INFO ] stable-diffusion.cpp:231  - loading vae from '/home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors'
[INFO ] model.cpp:1013 - load /home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors using safetensors format
[DEBUG] model.cpp:1088 - init from '/home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors'
[INFO ] stable-diffusion.cpp:243  - Version: SDXL
[INFO ] stable-diffusion.cpp:277  - Weight type:                 q4_K
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     q4_K
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: q4_K
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:330  - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1216 - clip params backend buffer size =  191.00 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1216 - clip params backend buffer size =  587.42 MB(VRAM) (517 tensors)
[DEBUG] ggml_extend.hpp:1216 - unet params backend buffer size =  1960.49 MB(VRAM) (1680 tensors)
[DEBUG] ggml_extend.hpp:1216 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:459  - loading weights
[DEBUG] model.cpp:1891 - loading tensors from /home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors

  |==================================================| 2641/2641 - 500.00it/s
[DEBUG] model.cpp:1891 - loading tensors from /home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors

  |==================>                               | 959/2641 - 0.00it/s
[INFO ] stable-diffusion.cpp:543  - total params memory size = 2833.38MB (VRAM 2833.38MB, RAM 0.00MB): clip 778.42MB(VRAM), unet 1960.49MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:562  - loading model from '/home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors' completed, taking 1.54s
[INFO ] stable-diffusion.cpp:604  - running in eps-prediction mode
[INFO ] stable-diffusion.cpp:610  - running with discrete schedule
[DEBUG] stable-diffusion.cpp:648  - finished loaded file
[DEBUG] stable-diffusion.cpp:1887 - generate_image 512x512
[INFO ] stable-diffusion.cpp:2017 - TXT2IMG
[DEBUG] stable-diffusion.cpp:1557 - prompt after extract and remove lora: "a cat"
[INFO ] stable-diffusion.cpp:738  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1562 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:358  - parse 'a cat' to [['a cat', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] ggml_extend.hpp:1168 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1168 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1168 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:486  - computing condition graph completed, taking 213 ms
[DEBUG] conditioner.hpp:358  - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] ggml_extend.hpp:1168 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] ggml_extend.hpp:1168 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] ggml_extend.hpp:1168 - clip compute buffer size: 2.33 MB(VRAM)
[DEBUG] conditioner.hpp:486  - computing condition graph completed, taking 211 ms
[INFO ] stable-diffusion.cpp:1696 - get_learned_condition completed, taking 426 ms
[INFO ] stable-diffusion.cpp:1719 - sampling using DDIM "trailing" method
[INFO ] stable-diffusion.cpp:1768 - generating image: 1/1 - seed 854311355
[DEBUG] stable-diffusion.cpp:865  - Sample
[DEBUG] ggml_extend.hpp:1168 - unet compute buffer size: 123.46 MB(VRAM)

  |==================================================| 20/20 - 4.93s/it
[INFO ] stable-diffusion.cpp:1806 - sampling completed, taking 75.23s
[INFO ] stable-diffusion.cpp:1814 - generating 1 latent images completed, taking 75.24s
[INFO ] stable-diffusion.cpp:1817 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1168 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1182 - computing vae [mode: DECODE] graph completed, taking 42.42s
[INFO ] stable-diffusion.cpp:1827 - latent 1 decoded, taking 42.42s
[INFO ] stable-diffusion.cpp:1831 - decode_first_stage completed, taking 42.42s
[INFO ] stable-diffusion.cpp:2088 - generate_image completed in 118.08s
save result PNG image to '/home/phil2sat/sd.cpp-webui/outputs/txt2img/14.png'

phil2sat avatar Aug 01 '25 14:08 phil2sat

Usually, black images on SDXL are because of the original VAE being broken, but it looks like you're using a supposedly fixed VAE already so it's hard to tell.
A conditioner bug would indeed fit your observations, but it's the first time I hear about it. Can you try to check if you can reproduce the issue with --clip-on-cpu?

stduhpf avatar Aug 01 '25 15:08 stduhpf

I did hit a weird behavior once, that at first I suspected was from a conditioner bug: a specific weight change on a prompt produced a distorted image, then another change made garbage (just a colorful pattern). I couldn't reproduce it afterwards, so I concluded it could be related to too much VRAM in use, or something like that.

I'm also on Linux + radv, so I can try to reproduce it here. Could you point me to the exact sd version and model files you're running?

wbruna avatar Aug 01 '25 20:08 wbruna

sorry for the late reply, tested the whole day and night, and fallen asleep completely dead. and thanks for your reply.

--clip-on-cpu --vae-on-cpu, quantization up to q2_K didnt make a difference. VRAM usage while run was around 2.8GB from 4GB so i didn't get any memory problems.

my main pc from yesterday has also weird behaviour, in a row of 10 pics it will generate around 3-5 pics, the rest is black. so possible that it was the systems fault not the conditioner but in the debug log it was definitively cut. maybe its also a fault of sdccpp webui, have to test the prompt directly.

modelwise i tested around 12 different models sd1.5+sdxl mainly i use pony realism 2.2 https://civitai.com/models/372465?modelVersionId=914390 after a lot of testing this model does what i prompt and doesnt give much random pics, also eyes, face, hands and feet are nearly perfect.

today im on another pc to replicate. just now did a batch of 10 cats, all fine, so no random black picture with my setting (maybe its the R9 290) long prompt incomming...

damn i guess it was random everytime i removed the long prompt an parts of negative my pc generated a pic and if i did it back not. i tested at least ten times.

now on my other pc i did run a batch of 10 an all are fine, but on debug the cut is there: [', blushing, photorealis # here i see the cut [DEBUG] clip.hpp:311 - token length: 154

copied the sdcpp webui generated command (full prompt there) and ran it directly, i can see the cut also in debug so its not a webui fault. maybe debug log is unable to output enough prompt, maybe conditioner will ingore more input, i dont know.

BUT on my laptop in all cases i get everytime an image with the same settings as on my main pc.

Question now is, why does the R9 290 generate just fine but only every 2nd/3rd image!?

phil2sat avatar Aug 02 '25 06:08 phil2sat

Question now is, why does the R9 290 generate just fine but only every 2nd/3rd image!?

I would guess either broken drivers, or dying VRAM

stduhpf avatar Aug 02 '25 10:08 stduhpf

Question now is, why does the R9 290 generate just fine but only every 2nd/3rd image!?

I would guess either broken drivers, or dying VRAM

Could also be heat related, how much do you fancy repasting your gpu?

Green-Sky avatar Aug 02 '25 10:08 Green-Sky

https://civitai.com/models/372465?modelVersionId=914390

That's not a quantized model, so I'll assume you converted it yourself.

now on my other pc i did run a batch of 10 an all are fine,

Do you mean the -b / --batch-count option? Does the failed generations also used it?

Please give an exact command line that triggers this issue, so I can try to reproduce it here. Look, I'm not being picky: the issue could be related to a detail that's not being clearly mentioned, even if it doesn't seem relevant at first.

wbruna avatar Aug 02 '25 11:08 wbruna

so after rebuilding my dang old gpu it seems fine, flashed other slower bios, now it generates images, i guess it was a vram issue, to high clock, but rebuild is just happened, now going to test, motitoring everything while generating, gfx vram clocks, gfx clocks, temps, vram usage gtt usage...

at least the last couple of pics where fine in 640x960.

https://civitai.com/models/372465?modelVersionId=914390

That's not a quantized model, so I'll assume you converted it yourself.

now on my other pc i did run a batch of 10 an all are fine,

Do you mean the -b / --batch-count option? Does the failed generations also used it?

Please give an exact command line that triggers this issue, so I can try to reproduce it here. Look, I'm not being picky: the issue could be related to a detail that's not being clearly mentioned, even if it doesn't seem relevant at first.

yes used sd command or better the webui to convert the model so i dont have to wait after every generation. did run single generations and sometimes a batch with -b but this didnt make a difference

for the exact prompt, i don't know if it matters for now, i have to make the proof if it work really stable now:

sd -M img_gen -p "very long prompt with weights" -n "blurry, low resolution, low quality, bad anatomy, poorly drawn face, missing limbs, extra limbs, duplicate body parts, cropped, jpeg artifacts, bad hands, malformed feet, deformed pussy, unrealistic proportions, watermark, text, sketch, cartoon, CGI render, anime, painting, 3D render, noise, bad composition, tanlines" --sampling-method dpm++2mv2 --steps 30 --schedule karras -W 640 -H 960 -b 1 --cfg-scale 7 -s -1 --clip-skip 2 --embd-dir /home/phil2sat/sd.cpp-webui/models/embeddings/ --lora-model-dir /home/phil2sat/sd.cpp-webui/models/loras/ -t 0 --rng cuda -o /home/phil2sat/sd.cpp-webui/outputs/txt2img/2.png --model /home/phil2sat/sd.cpp-webui/models/checkpoints/ponyRealism_V22_q4_k.safetensors --vae /home/phil2sat/sd.cpp-webui/models/vae/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors --vae-on-cpu --color --diffusion-fa --diffusion-conv-direct --vae-conv-direct

but as is told for now it works and 4.3s/it on that old iron is not bad i guess, testing.....

phil2sat avatar Aug 02 '25 14:08 phil2sat

I did hit a weird behavior once, that at first I suspected was from a conditioner bug: a specific weight change on a prompt produced a distorted image, then another change made garbage (just a colorful pattern). I couldn't reproduce it afterwards, so I concluded it could be related to too much VRAM in use, or something like that.

Just documenting this, in case anyone else hits the same bug. It's likely a model bug, but there is something funny going on with the conditioner.

I was only able to reproduce it with versions of this checkpoint: https://civitai.com/models/709404?modelVersionId=1871475 (caveat: it's a Pony model. Not overtly NSFW, but be aware this test messes up with prompt adherence...). It doesn't seem to depend on resolution, sampler, flash attention or acceleration LoRAs, only the exact prompt string:

prompt image
car, (vintage, opened door, green :0.5) Image
car, (vintage, opened door, green :0.1) Image
,,,,,, car, (vintage, opened door, green :0.1) Image
,,,,,, car, (vintage, opened door, green :0.1) ,,,,,,,,, Image
,,,,,, car, (vintage, door, green :0.1) Image
,,,,,, (green :0.1) car, (vintage, opened door:0.1) Image

(these were generated on Koboldcpp, but they behave the same on plain sd.cpp)

At first, I got a broken result just by changing a specific word in a much larger prompt. The commas were an attempt to find out if I was near the 77 token limit; then I noticed adding commas seemed to weaken prompt adherence to specific words, up to a point where it destroyed the image. Sometimes, adding more commas, changing weights or just reordering a few words is enough to fix the generation.

wbruna avatar Sep 05 '25 23:09 wbruna