stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

[Feature Request]: SD XL support

Open zhongwei opened this issue 2 years ago • 16 comments

zhongwei avatar Aug 17 '23 03:08 zhongwei

I'm willing to implement SDXL once I've improved the support for SD 1.x and added support for SD 2.x.

leejet avatar Aug 17 '23 13:08 leejet

Beside LCM being available for XL models, stability.ai released SDXL-turbo a destil(finetune?) that can generate good images with a single step.

image https://huggingface.co/stabilityai/sdxl-turbo

Green-Sky avatar Nov 29 '23 15:11 Green-Sky

is is compatible with this repo?

shaharhi avatar Dec 13 '23 19:12 shaharhi

@leejet this can be closed

FSSRepo avatar Dec 29 '23 02:12 FSSRepo

@zhongwei Support for SDXL has been added. You can try pulling the latest code from the master branch.

leejet avatar Dec 30 '23 05:12 leejet

@leejet this can be closed

Generally, I don't proactively close issues unless they've been resolved for an extended period without any response from the person who opened the issue. I prefer the individuals who opened the issue to confirm its resolution and close it themselves.

leejet avatar Dec 30 '23 06:12 leejet

Did anyone try running sd_xl ? For some reason its generating a empty image (its pitch black) .. Following is the command i used and its output

$ ./bin/sd  -m ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors -p "a lovely cat"
[INFO]  stable-diffusion.cpp:5386 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors'
[INFO]  model.cpp:638  - load ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors using safetensors format
[INFO]  stable-diffusion.cpp:5412 - Stable Diffusion XL
[INFO]  stable-diffusion.cpp:5418 - Stable Diffusion weight type: f16
[INFO]  stable-diffusion.cpp:5573 - total memory buffer size = 6570.56MB (clip 1565.66MB, unet 4909.43MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5579 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors' completed, taking 1.78s
[INFO]  stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:6525 - get_learned_condition completed, taking 1547 ms
[INFO]  stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 18.15s/it
[INFO]  stable-diffusion.cpp:6551 - sampling completed, taking 353.73s
[INFO]  stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 353.89s
[INFO]  stable-diffusion.cpp:6561 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6571 - latent 1 decoded, taking 17.36s
[INFO]  stable-diffusion.cpp:6575 - decode_first_stage completed, taking 17.36s
[INFO]  stable-diffusion.cpp:6590 - txt2img completed in 372.80s
[INFO]  main.cpp:538  - save result image to 'output.png'

i also tried downloading the unets/vae etc.. and passing the same as argument (along with some minor code changes to load f16.safetensor instead of just .safetensor - std::string unet_path = path_join(file_path, "unet/diffusion_pytorch_model.safetensors");)

$ ./bin/sd  -m ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors --vae ~/downloaded_models/sdxl-turbo/ -p "a lovely cat"
[INFO]  stable-diffusion.cpp:5386 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors'
[INFO]  model.cpp:638  - load ~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors using safetensors format
[INFO]  stable-diffusion.cpp:5395 - loading vae from '~/downloaded_models/sdxl-turbo/'
[INFO]  model.cpp:632  - load ~/downloaded_models/sdxl-turbo/ using diffusers format
[INFO]  stable-diffusion.cpp:5412 - Stable Diffusion XL
[INFO]  stable-diffusion.cpp:5418 - Stable Diffusion weight type: f16
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_1.bias' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_1.weight' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_2.bias' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'unet.add_embedding.linear_2.weight' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'model.diffusion_model.output_blocks.2.1.conv.bias' in model file
[WARN]  stable-diffusion.cpp:5503 - unknown tensor 'model.diffusion_model.output_blocks.2.1.conv.weight' in model file
[INFO]  stable-diffusion.cpp:5573 - total memory buffer size = 6570.56MB (clip 1565.66MB, unet 4909.43MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:5579 - loading model from '~/downloaded_models/sdxl-turbo/sd_xl_turbo_1.0_fp16.safetensors' completed, taking 2.61s
[INFO]  stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO]  stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
[INFO]  stable-diffusion.cpp:6525 - get_learned_condition completed, taking 1592 ms
[INFO]  stable-diffusion.cpp:6535 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:6539 - generating image: 1/1 - seed 42
  |==================================================| 20/20 - 18.09s/it
[INFO]  stable-diffusion.cpp:6551 - sampling completed, taking 353.85s
[INFO]  stable-diffusion.cpp:6559 - generating 1 latent images completed, taking 353.85s
[INFO]  stable-diffusion.cpp:6561 - decoding 1 latents
[INFO]  stable-diffusion.cpp:6571 - latent 1 decoded, taking 17.08s
[INFO]  stable-diffusion.cpp:6575 - decode_first_stage completed, taking 17.08s
[INFO]  stable-diffusion.cpp:6590 - txt2img completed in 372.51s
[INFO]  main.cpp:538  - save result image to 'output.png'

But its the same result.. i have tried the older stable diffusion - stable-diffusion-2-1/v2-1_768-nonema-pruned.safetensors it works.. I m running on ubuntu 22.03.

ranjithum avatar Dec 31 '23 03:12 ranjithum

@ranjithum The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: SDXL VAE FP16 Fix.

./bin/sd -m ../models/sd_xl_base_1.0.safetensors --vae ../models/sdxl_vae-fp16-fix.safetensors -H 1024 -W 1024 -p "a lovely cat" -v

leejet avatar Dec 31 '23 03:12 leejet

@leejet - Perfect thanks.. It worked..

ranjithum avatar Dec 31 '23 09:12 ranjithum

@leejet we should probably put up a warning in the program, when f32 vae is used. (until its fixed).

Green-Sky avatar Dec 31 '23 10:12 Green-Sky

Works for me, but colors are weirdly off with SD XL plus fp16 fix: output output

niansa avatar Jan 14 '24 11:01 niansa

Works for me, but colors are weirdly off with SD XL plus fp16 fix: output output

Try changing the image size to 1024x1024. SDXL is not suitable for generating images of size 512x512.

leejet avatar Jan 15 '24 15:01 leejet

Try changing the image size to 1024x1024. SDXL is not suitable for generating images of size 512x512.

Nope, still just as broken for me.

niansa avatar Jan 29 '24 19:01 niansa

stable-diffusion.cpp/build/bin/sd -m stable-diffusion.cpp/models/sd_xl_turbo_1.0_fp16.safetensors --vae stable-diffusion.cpp/models/sdxl_vae.safetensors --steps 1 --cfg-scale 1 -s -1 -p "a lovely cat"

Work perfectly for me

ServeurpersoCom avatar Feb 10 '24 07:02 ServeurpersoCom

Loras don't work for me for some reason. maybe I'm doing something incorrectly.

I'm using the following command:

for m in models/SDXL/*.safetensors; do ./stable-diffusion.cpp/dist/bin/sd -m "${m}" -p "a cute cat <lora:SCRATCHBOARD ILLUSTRATION:0.8>" -W 1024 -H 1024 --steps 30 --sampling-method dpm++2m --schedule karras --embd-dir models/embeddings/ --vae models/SDXL/vae/sdxl_vae.safetensors -s $RANDOM -b 2 --lora-model-dir models/SDXL/lora/ -v -o images/$(basename -- "$m" ".${m##*.}"| tr " " "-").png -v ; done;

and this lora https://civitai.com/models/279729/wizards-scratchboard-illustration

The relevant (abbreviated) portion of the output:

[INFO ] model.cpp:645  - load models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors using safetensors format
[DEBUG] model.cpp:711  - init from 'models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors'
[DEBUG] ggml_extend.hpp:555  - lora params backend buffer size =  874.24 MB (10240 tensors)
[INFO ] lora.hpp:35   - loading LoRA from 'models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors'
[DEBUG] model.cpp:1262 - loading tensors from models/SDXL/lora/SCRATCHBOARD ILLUSTRATION.safetensors
[DEBUG] lora.hpp:58   - finished loaded lora
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.alpha
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.lora_down.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc1.lora_up.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.alpha
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.lora_down.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_mlp_fc2.lora_up.weight
[WARN ] lora.hpp:154  - unused lora tensor lora.te1_text_model_encoder_layers_0_self_attn_k_proj.alpha
...

UPD: it now works after at least the 48bcce493f45a11d9d5a4c69943d03ff919d748f commit

scientism avatar Feb 18 '24 08:02 scientism

The official example LoRA is failing for me too (from https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main).

(base) tnunamak@pc:~/stable-diffusion.cpp$ ./build/bin/sd -m models/sd_xl_base_1.0.safetensors --vae models/sdxl_vae.safetensors -H 1024 -W 768 --cfg-scale 1 --steps 35 -p "A lovely cat <lora:sd_xl_offset_example-lora_1.0:0.8>" --lora-model-dir models
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
[INFO ] stable-diffusion.cpp:165  - loading model from 'models/sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load models/sd_xl_base_1.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:176  - loading vae from 'models/sdxl_vae.safetensors'
[INFO ] model.cpp:705  - load models/sdxl_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:188  - Stable Diffusion XL 
[INFO ] stable-diffusion.cpp:194  - Stable Diffusion weight type: f16
[INFO ] stable-diffusion.cpp:400  - total params memory size = 6558.89MB (VRAM 6558.89MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:419  - loading model from 'models/sd_xl_base_1.0.safetensors' completed, taking 3.72s
[INFO ] stable-diffusion.cpp:436  - running in eps-prediction mode
[INFO ] model.cpp:705  - load models/sd_xl_offset_example-lora_1.0.safetensors using safetensors format
[INFO ] lora.hpp:38   - loading LoRA from 'models/sd_xl_offset_example-lora_1.0.safetensors'
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.alpha
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.lora_down.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_emb_layers_1.lora_up.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.alpha
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.lora_down.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_in_layers_2.lora_up.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.alpha
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.lora_down.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_1_0_out_layers_3.lora_up.weight
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_input_blocks_2_0_emb_layers_1.alpha
...
[WARN ] lora.hpp:160  - unused lora tensor lora.unet_output_blocks_8_0_skip_connection.lora_up.weight
[INFO ] stable-diffusion.cpp:524  - lora 'sd_xl_offset_example-lora_1.0' applied, taking 1.01s
[INFO ] stable-diffusion.cpp:1602 - apply_loras completed, taking 1.01s
[INFO ] stable-diffusion.cpp:1712 - get_learned_condition completed, taking 93 ms
[INFO ] stable-diffusion.cpp:1728 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1732 - generating image: 1/1 - seed 42
  |==================================================| 35/35 - 2.90it/s
[INFO ] stable-diffusion.cpp:1769 - sampling completed, taking 12.61s
[INFO ] stable-diffusion.cpp:1777 - generating 1 latent images completed, taking 12.61s
[INFO ] stable-diffusion.cpp:1779 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1789 - latent 1 decoded, taking 0.99s
[INFO ] stable-diffusion.cpp:1793 - decode_first_stage completed, taking 0.99s
[INFO ] stable-diffusion.cpp:1810 - txt2img completed in 13.70s
save result image to 'output.png'
double free or corruption (fasttop)
Aborted (core dumped)

tnunamak avatar Apr 11 '24 01:04 tnunamak