stable-diffusion.cpp [Feature Request]: SD XL support

Aug 17 '23 03:08 zhongwei

I'm willing to implement SDXL once I've improved the support for SD 1.x and added support for SD 2.x.

Aug 17 '23 13:08 leejet

Beside LCM being available for XL models, stability.ai released SDXL-turbo a destil(finetune?) that can generate good images with a single step.

https://huggingface.co/stabilityai/sdxl-turbo

Nov 29 '23 15:11 Green-Sky

is is compatible with this repo?

Dec 13 '23 19:12 shaharhi

@leejet this can be closed

Dec 29 '23 02:12 FSSRepo

@zhongwei Support for SDXL has been added. You can try pulling the latest code from the master branch.

Dec 30 '23 05:12 leejet

@leejet this can be closed

Generally, I don't proactively close issues unless they've been resolved for an extended period without any response from the person who opened the issue. I prefer the individuals who opened the issue to confirm its resolution and close it themselves.

Dec 30 '23 06:12 leejet

Did anyone try running sd_xl ? For some reason its generating a empty image (its pitch black) .. Following is the command i used and its output

$ ./bin/sd  -m ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors -p "a lovely cat"
[INFO]  stable-diffusion.cpp:5386 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors'
[INFO]  model.cpp:638  - load ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors using safetensors format
[INFO]  stable-diffusion.cpp:5412 - Stable Diffusion XL
[INFO]  stable-diffusion.cpp:5418 - Stable Diffusion weight type: f16
[INFO]  stable-diffusion.cpp:5573 - total memory buffer size = 6570.56MB (clip 1565.66MB, unet 4909.43MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5579 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors' completed, taking 1.78s
[INFO]  stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:6525 - get_learned_condition completed, taking 1547 ms
[INFO]  stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 18.15s/it
[INFO]  stable-diffusion.cpp:6551 - sampling completed, taking 353.73s
[INFO]  stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 353.89s
[INFO]  stable-diffusion.cpp:6561 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6571 - latent 1 decoded, taking 17.36s
[INFO]  stable-diffusion.cpp:6575 - decode_first_stage completed, taking 17.36s
[INFO]  stable-diffusion.cpp:6590 - txt2img completed in 372.80s
[INFO]  main.cpp:538  - save result image to 'output.png'

i also tried downloading the unets/vae etc.. and passing the same as argument (along with some minor code changes to load f16.safetensor instead of just .safetensor - std::string unet_path = path_join(file_path, "unet/diffusion_pytorch_model.safetensors");)

$ ./bin/sd  -m ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors --vae ~/downloaded_models/sdxl-turbo/ -p "a lovely cat"
[INFO]  stable-diffusion.cpp:5386 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors'
[INFO]  model.cpp:638  - load ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors using safetensors format
[INFO]  stable-diffusion.cpp:5395 - loading vae from '~/downloaded_models/sdxl-turbo/'
[INFO]  model.cpp:632  - load ~/downloaded_models/sdxl-turbo/ using diffusers format
[INFO]  stable-diffusion.cpp:5412 - Stable Diffusion XL
[INFO]  stable-diffusion.cpp:5418 - Stable Diffusion weight type: f16
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_1.bias' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_1.weight' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_2.bias' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_2.weight' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'model.diffusion_model.output_blocks.2.1.conv.bias' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'model.diffusion_model.output_blocks.2.1.conv.weight' in model file
[INFO]  stable-diffusion.cpp:5573 - total memory buffer size = 6570.56MB (clip 1565.66MB, unet 4909.43MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5579 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors' completed, taking 2.61s
[INFO]  stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:6525 - get_learned_condition completed, taking 1592 ms
[INFO]  stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 18.09s/it
[INFO]  stable-diffusion.cpp:6551 - sampling completed, taking 353.85s
[INFO]  stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 353.85s
[INFO]  stable-diffusion.cpp:6561 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6571 - latent 1 decoded, taking 17.08s
[INFO]  stable-diffusion.cpp:6575 - decode_first_stage completed, taking 17.08s
[INFO]  stable-diffusion.cpp:6590 - txt2img completed in 372.51s
[INFO]  main.cpp:538  - save result image to 'output.png'

But its the same result.. i have tried the older stable diffusion - stable-diffusion-2-1/v2-1_768-nonema-pruned.safetensors it works.. I m running on ubuntu 22.03.

Dec 31 '23 03:12 ranjithum

@ranjithum The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: SDXL VAE FP16 Fix.

./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v

Dec 31 '23 03:12 leejet

@leejet - Perfect thanks.. It worked..

Dec 31 '23 09:12 ranjithum

@leejet we should probably put up a warning in the program, when f32 vae is used. (until its fixed).

Dec 31 '23 10:12 Green-Sky

Works for me, but colors are weirdly off with SD XL plus fp16 fix: output output

Jan 14 '24 11:01 niansa

Works for me, but colors are weirdly off with SD XL plus fp16 fix:

Try changing the image size to 1024x1024. SDXL is not suitable for generating images of size 512x512.

Jan 15 '24 15:01 leejet

Try changing the image size to 1024x1024. SDXL is not suitable for generating images of size 512x512.

Nope, still just as broken for me.

Jan 29 '24 19:01 niansa

stable-diffusion.cpp/build/bin/sd -m stable-diffusion.cpp/models/sd_xl_turbo_1.0_fp16.safetensors --vae stable-diffusion.cpp/models/sdxl_vae.safetensors --steps 1 --cfg-scale 1 -s -1 -p "a lovely cat"

Work perfectly for me

Feb 10 '24 07:02 ServeurpersoCom

Loras don't work for me for some reason. maybe I'm doing something incorrectly.

I'm using the following command:

for m in models/SDXL/*.safetensors; do ./stable-diffusion.cpp/dist/bin/sd -m "${m}" -p "a cute cat <lora:SCRATCHBOARD ILLUSTRATION:0.8>" -W 1024 -H 1024 --steps 30 --sampling-method dpm++2m --schedule karras --embd-dir models/embeddings/ --vae models/SDXL/vae/sdxl_vae.safetensors -s $RANDOM -b 2 --lora-model-dir models/SDXL/lora/ -v -o images/$(basename -- "$m" ".${m##*.}"| tr " " "-").png -v ; done;

and this lora https://civitai.com/models/279729/wizards-scratchboard-illustration

The relevant (abbreviated) portion of the output:

[INFO ] model.cpp:645  - load models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors using safetensors format
[DEBUG] model.cpp:711  - init from 'models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors'
[DEBUG] ggml_extend.hpp:555  - lora params backend buffer size =  874.24 MB (10240 tensors)
[INFO ] lora.hpp:35   - loading LoRA from 'models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors'
[DEBUG] model.cpp:1262 - loading tensors from models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors
[DEBUG] lora.hpp:58   - finished loaded lora
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.alpha
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.lora_down.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.lora_up.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.alpha
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.lora_down.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.lora_up.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_self_attn_k_proj.alpha
...

UPD: it now works after at least the 48bcce493f45a11d9d5a4c69943d03ff919d748f commit

Feb 18 '24 08:02 scientism

The official example LoRA is failing for me too (from https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main).

(base) tnunamak@pc:~/stable-diffusion.cpp$ ./build/bin/sd -m models/sd_xl_base_1.0.safetensors --vae models/sdxl_vae.safetensors -H 1024 -W 768 --cfg-scale 1 --steps 35 -p "A lovely cat <lora:sd_xl_offset_example-lora_1.0:0.8>" --lora-model-dir models
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from 'models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load models/sd_xl_base_1.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:176  - loading vae from 'models/sdxl_vae.safetensors'
[INFO ] model.cpp:705  - load models/sdxl_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL 
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[INFO ] stable-diffusion.cpp:400  - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from 'models/sd_xl_base_1.0.safetensors' completed, taking 3.72s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[INFO ] model.cpp:705  - load models/sd_xl_offset_example-lora_1.0.safetensors using safetensors format
[INFO ] lora.hpp:38   - loading LoRA from 'models/sd_xl_offset_example-lora_1.0.safetensors'
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.alpha
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.lora_down.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.lora_up.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.alpha
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.lora_down.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.lora_up.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.alpha
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.lora_down.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.lora_up.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_2_0_emb_layers_1.alpha
...
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_output_blocks_8_0_skip_connection.lora_up.weight
[INFO ] stable-diffusion.cpp:524  - lora 'sd_xl_offset_example-lora_1.0' applied, taking 1.01s
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 1.01s
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 93 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 42
  |==================================================| 35/35 - 2.90it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 12.61s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 12.61s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 0.99s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 0.99s
[INFO ] stable-diffusion.cpp:1810 - txt2img completed in 13.70s
save result image to 'output.png'
double free or corruption (fasttop)
Aborted (core dumped)

Apr 11 '24 01:04 tnunamak