stable-diffusion.cpp gibberish/noisy image with converted to Q8

Hello :) I am using compiled sd.cpp with SYCL kernel since i have intel arc gpu

so i started with conversion model from https://civitai.com/models/141592/pixelwave (which is 22+ gb) and it finished without any errors

  sd -M convert -m /home/models_test/pixelwave_flux1Dev02.safetensors -o  /home/unet/pixelwave_flux1De
v02_Q8_0.gguf -v --type q8_0

log of conversion process

Option:     
    n_threads:         8               
    mode:              convert         
    model_path:        /home/models_test/pixelwave_flux1Dev02.safetensors             
    wtype:             q8_0            
    clip_l_path:                       
    t5xxl_path:                        
    diffusion_model_path:              
    vae_path:                          
    taesd_path:                        
    esrgan_path:                       
    controlnet_path:                   
    embeddings_path:                   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:        /home/unet/pixelwave_flux1Dev02_Q8_0.gguf
    init_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         7.00
    guidance:          3.50
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler_a 
    schedule:          default 
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[INFO ] model.cpp:793  - load  /home/models_test/pixelwave_flux1Dev02.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from ' /home/models_test/pixelwave_flux1Dev02.safetensors'
[INFO ] model.cpp:1776 - model tensors mem size: 12248.99MB
[DEBUG] model.cpp:1530 - loading tensors from  /home/models_test/pixelwave_flux1Dev02.safetensors
[INFO ] model.cpp:1811 - load tensors done
[INFO ] model.cpp:1812 - trying to save tensors to  /home/unet/pixelwave_flux1Dev02_Q8_0.gguf
convert ' /home/models_test/pixelwave_flux1Dev02.safetensors'/'' to ' /home/unet/pixelwave_flux1Dev02_Q8_0.gguf' success

then i am trying to generate image with sd.cpp using this command and no errors again so far while generating image

  sd --diffusion-model /home/unet/pixelwave_flux1Dev02_Q8_0.gguf --vae /home/vae/flux_vae.safetensors --clip_l /home/clip/clip_l.safetensors --t5xxl /home/clip/clip_t5xxl_fp16.safetensors  -p "The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within" --cfg-scale 1.0 --sampling-method euler --schedule discrete -v -H 1024 -W 1024 --steps 16 --vae-on-cpu -o /tmp/output.png

log of image generation process

Option: 
    n_threads:         8
    mode:              txt2img
    model_path:        
    wtype:             unspecified
    clip_l_path:        /home/clip/clip_l.safetensors
    t5xxl_path:         /home/clip/clip_t5xxl_fp16.safetensors
    diffusion_model_path:    /home/unet/pixelwave_flux1Dev02_Q8_0.gguf
    vae_path:           /home/vae/flux_vae.safetensors
    taesd_path:        
    esrgan_path:       
    controlnet_path:   
    embeddings_path:   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       /tmp/output.png
    init_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:true
    strength(control): 0.90
    prompt:            The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         1.00
    guidance:          3.50
    clip_skip:         -1
    width:             1024
    height:            1024
    sample_method:     euler
    schedule:          discrete
    sample_steps:      16
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:175  - Using SYCL backend
[SYCL] call ggml_check_sycl
ggml_check_sycl: GGML_SYCL_DEBUG: 0
ggml_check_sycl: GGML_SYCL_F16: no
ZE_LOADER_DEBUG_TRACE:Using Loader Library Path: 
ZE_LOADER_DEBUG_TRACE:Tracing Layer Library Path: libze_tracing_layer.so.1
found 1 SYCL devices:
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|    1.5|    512|    1024|   32| 16225M|            1.3.30872|
ggml_sycl_init: GGML_SYCL_FORCE_MMQ:   no
ggml_sycl_init: SYCL_USE_XMX: yes
ggml_sycl_init: found 1 SYCL devices:
[WARN ] stable-diffusion.cpp:185  - Flash Attention not supported with GPU Backend
[INFO ] stable-diffusion.cpp:202  - loading clip_l from ' /home/clip/clip_l.safetensors'
[INFO ] model.cpp:793  - load  /home/clip/clip_l.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from ' /home/clip/clip_l.safetensors'
[INFO ] stable-diffusion.cpp:209  - loading t5xxl from ' /home/clip/clip_t5xxl_fp16.safetensors'
[INFO ] model.cpp:793  - load  /home/clip/clip_t5xxl_fp16.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from ' /home/clip/clip_t5xxl_fp16.safetensors'
[INFO ] stable-diffusion.cpp:216  - loading diffusion model from ' /home/unet/pixelwave_flux1Dev02_Q8_0.gguf'
[INFO ] model.cpp:790  - load  /home/unet/pixelwave_flux1Dev02_Q8_0.gguf using gguf format
[DEBUG] model.cpp:807  - init from ' /home/unet/pixelwave_flux1Dev02_Q8_0.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:223  - loading vae from ' /home/vae/flux_vae.safetensors'
[INFO ] model.cpp:793  - load  /home/vae/flux_vae.safetensors using safetensors format
[DEBUG] model.cpp:861  - init from ' /home/vae/flux_vae.safetensors'
[INFO ] stable-diffusion.cpp:235  - Version: Flux Dev 
[INFO ] stable-diffusion.cpp:266  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:267  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:268  - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:269  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:271  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:310  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:313  - CLIP: Using CPU backend
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size =  12068.09 MB(VRAM) (780 tensors)
[INFO ] stable-diffusion.cpp:334  - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size =  94.57 MB(RAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:398  - loading weights
[DEBUG] model.cpp:1530 - loading tensors from  /home/clip/clip_l.safetensors
[DEBUG] model.cpp:1530 - loading tensors from  /home/clip/clip_t5xxl_fp16.safetensors
[INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1530 - loading tensors from  /home/unet/pixelwave_flux1Dev02_Q8_0.gguf
[DEBUG] model.cpp:1530 - loading tensors from  /home/vae/flux_vae.safetensors
[INFO ] stable-diffusion.cpp:497  - total params memory size = 21481.50MB (VRAM 12068.09MB, RAM 9413.41MB): clip 9318.83MB(RAM), unet 12068.09MB(VRAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:501  - loading model from '' completed, taking 6.08s
[INFO ] stable-diffusion.cpp:518  - running in Flux FLOW mode
[INFO ] stable-diffusion.cpp:534  - running with discrete schedule
[DEBUG] stable-diffusion.cpp:572  - finished loaded file
[DEBUG] stable-diffusion.cpp:1378 - txt2img 1024x1024
[DEBUG] stable-diffusion.cpp:1127 - prompt after extract and remove lora: "The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within"
[INFO ] stable-diffusion.cpp:655  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1132 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1036 - parse 'The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within' to [['The transparent orb creates an intriguing, otherworldly atmosphere and allows viewers to peer into the fantasy world within', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] ggml_extend.hpp:1001 - t5 compute buffer size: 68.25 MB(RAM)
[DEBUG] conditioner.hpp:1155 - computing condition graph completed, taking 6029 ms
[INFO ] stable-diffusion.cpp:1256 - get_learned_condition completed, taking 6030 ms
[INFO ] stable-diffusion.cpp:1279 - sampling using Euler method
[INFO ] stable-diffusion.cpp:1283 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1001 - flux compute buffer size: 2577.25 MB(VRAM)

  |===>                                              | 1/16 - 6.89s/it
  |======>                                           | 2/16 - 6.44s/it
  |=========>                                        | 3/16 - 6.48s/it
  |============>                                     | 4/16 - 6.55s/it
  |===============>                                  | 5/16 - 6.48s/it
  |==================>                               | 6/16 - 6.43s/it
  |=====================>                            | 7/16 - 6.41s/it
  |=========================>                        | 8/16 - 6.44s/it
  |============================>                     | 9/16 - 6.49s/it
  |===============================>                  | 10/16 - 6.53s/it
  |==================================>               | 11/16 - 6.52s/it
  |=====================================>            | 12/16 - 6.52s/it
  |========================================>         | 13/16 - 6.58s/it
  |===========================================>      | 14/16 - 6.45s/it
  |==============================================>   | 15/16 - 6.48s/it
  |==================================================| 16/16 - 6.42s/it
[INFO ] stable-diffusion.cpp:1315 - sampling completed, taking 104.18s
[INFO ] stable-diffusion.cpp:1323 - generating 1 latent images completed, taking 104.20s
[INFO ] stable-diffusion.cpp:1326 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1001 - vae compute buffer size: 6656.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:987  - computing vae [mode: DECODE] graph completed, taking 46.52s
[INFO ] stable-diffusion.cpp:1336 - latent 1 decoded, taking 46.52s
[INFO ] stable-diffusion.cpp:1340 - decode_first_stage completed, taking 46.52s
[INFO ] stable-diffusion.cpp:1449 - txt2img completed in 156.75s
save result image to '/tmp/output.png'

and in the end i am getting this kind of gibberish image

result image

output

How to fix this image generation process? Or i need to change something in conversion? If i trying to generate with ComfyUI the result is same.

$fractal-fumbler avatar$ Oct 01 '24 15:10 fractal-fumbler

Not sure this is the issue you are running into but for instance flux apparently needs the shift value set otherwise it gets blurry... which it appears S-D.cpp does not have a parameter for currently. It might make sense to have this as a both a parameter to set custom shift as well as a flag to auto set shift for flux models that need it, if you have both turned the custom parameter could increase or decrease the auto calculated shift value.

function calcShift(h, w) { const step1 = (h * w) / 256; const step2 = (1.15 - 0.5) / (4096 - 256); const step3 = (step1 - 256) * step2 + 0.5; const result = Math.exp(step3); return Math.round(result * 100) / 100; }

https://www.reddit.com/r/drawthingsapp/comments/1erjvur/flux1_dev_8bit_generation_is_blurry/

Example blurry output at 30 steps with q8 flux dev:

Oct 01 '24 18:10 cb88

Note I did some more testing and get the same noise as you on Radeon W7800 with vulkan (wasn't able to load HIP it says rocblas is missing) , the CPU implementation works fine with the same models though.

.\sd.exe -t 8 -v --cfg-scale 1 --rng std_default --vae-tiling --diffusion-model ..\models\Flux\flux1-schnell-Q8_0.gguf --clip_l ..\models\Flux\clip_l.safetensors --vae ..\models\Flux\ae.safetensors --t5xxl ..\models\Flux\t5xxl_fp16.safetensors -H 640 -W 448 --steps 1 -p "a corgi dog sitting on a mossy spot in a lush forest" -b 1 -o corgi.png Option: n_threads: 8 mode: txt2img model_path: wtype: unspecified clip_l_path: ..\models\Flux\clip_l.safetensors t5xxl_path: ..\models\Flux\t5xxl_fp16.safetensors diffusion_model_path: ..\models\Flux\flux1-schnell-Q8_0.gguf vae_path: ..\models\Flux\ae.safetensors taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normalize input image : false output_path: corgi.png init_img: control_image: clip on cpu: false controlnet cpu: false vae decoder on cpu:false strength(control): 0.90 prompt: a corgi dog sitting on a mossy spot in a lush forest negative_prompt: min_cfg: 1.00 cfg_scale: 1.00 guidance: 3.50 clip_skip: -1 width: 448 height: 640 sample_method: euler_a schedule: default sample_steps: 1 strength(img2img): 0.75 rng: std_default seed: 42 batch_count: 1 vae_tiling: true upscale_repeats: 1 System Info: BLAS = 1 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:166 - Using Vulkan backend ggml_vulkan: Found 1 Vulkan devices: Vulkan0: AMD Radeon PRO W7800 (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 [INFO ] stable-diffusion.cpp:202 - loading clip_l from '..\models\Flux\clip_l.safetensors' [INFO ] model.cpp:793 - load ..\models\Flux\clip_l.safetensors using safetensors format [DEBUG] model.cpp:861 - init from '..\models\Flux\clip_l.safetensors' [INFO ] stable-diffusion.cpp:209 - loading t5xxl from '..\models\Flux\t5xxl_fp16.safetensors' [INFO ] model.cpp:793 - load ..\models\Flux\t5xxl_fp16.safetensors using safetensors format [DEBUG] model.cpp:861 - init from '..\models\Flux\t5xxl_fp16.safetensors' [INFO ] stable-diffusion.cpp:216 - loading diffusion model from '..\models\Flux\flux1-schnell-Q8_0.gguf' [INFO ] model.cpp:790 - load ..\models\Flux\flux1-schnell-Q8_0.gguf using gguf format [DEBUG] model.cpp:807 - init from '..\models\Flux\flux1-schnell-Q8_0.gguf' [INFO ] stable-diffusion.cpp:223 - loading vae from '..\models\Flux\ae.safetensors' [INFO ] model.cpp:793 - load ..\models\Flux\ae.safetensors using safetensors format [DEBUG] model.cpp:861 - init from '..\models\Flux\ae.safetensors' [INFO ] stable-diffusion.cpp:235 - Version: Flux Schnell [INFO ] stable-diffusion.cpp:266 - Weight type: f16 [INFO ] stable-diffusion.cpp:267 - Conditioner weight type: f16 [INFO ] stable-diffusion.cpp:268 - Diffusion model weight type: q8_0 [INFO ] stable-diffusion.cpp:269 - VAE weight type: f32 [DEBUG] stable-diffusion.cpp:271 - ggml tensor size = 400 bytes [INFO ] stable-diffusion.cpp:310 - set clip_on_cpu to true [INFO ] stable-diffusion.cpp:313 - CLIP: Using CPU backend [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors) [DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size = 12057.71 MB(VRAM) (776 tensors) [DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size = 94.57 MB(VRAM) (138 tensors) [DEBUG] stable-diffusion.cpp:398 - loading weights [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\clip_l.safetensors [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\t5xxl_fp16.safetensors [INFO ] model.cpp:1685 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\flux1-schnell-Q8_0.gguf [DEBUG] model.cpp:1530 - loading tensors from ..\models\Flux\ae.safetensors [INFO ] stable-diffusion.cpp:497 - total params memory size = 21471.11MB (VRAM 12152.28MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 12057.71MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:501 - loading model from '' completed, taking 12.72s/

Oct 04 '24 22:10 cb88

Note my Vega FE with Vulkan works fine.... so maybe some driver issues?

Oct 04 '24 22:10 cb88

I tried hard coding corrected shift values to pass to the denoiser but did not get the expected improvement.

Oct 23 '24 21:10 cb88

gibberish/noisy image with converted to Q8_0 gguf model