stable-diffusion.cpp i think something is broken for us cpu users [avx2]

hi

no matter what, it always crash and close the app itself after loading model or operating. i mean no matter img2img or txt or video it worked well in a couple of weeks ago builds but now it is broken and no error is being shown in -v debug it says= [debug] stable-diffusion.cpp:174 - using cpu backend then it has a window pop up sd.exe has stopped working tnq

Sep 08 '25 09:09 Amin456789

@LostRuins could u please take a look into it? i have no problem generation image with ur koboldcpp as it uses my avx2 very good but working with sd cpp doesnt work for me anymore

Sep 08 '25 14:09 Amin456789

Could you please give more details, so I can try to reproduce it here? Which version, model type, and parameters? Are you using a binary release, and on which platform?

Sep 08 '25 15:09 wbruna

hi, thank u for ur attention, i use windows 11 and for the test i tested wan 2.1 gguf and the official parameters for txt and image generation. i use photon lcm model that i use for older version of sd cpp which works for image generation even model conversaion doesnt work, i mean any operation doesnt work at all like before, i fairly know how to use sd cpp for a long time so i can tell something is wrong here unfurtunalty yes i use binary release from this repo. the latest version

Sep 08 '25 15:09 Amin456789

I don't have Windows here, but the sd-master-abb115c-bin-win-avx2-x64.zip binary seems to work fine on Wine (I don't have the merged model, so I'm using the LCM LoRA):

wine ./sd.exe -p 'forestlora:lcm-lora-sdv1-5:1.0' --sampling-method lcm --model ./photon_v1.safetensors --lora-model-dir . --steps 6 --cfg-scale 1 -H 320

Perhaps the specific sd.exe you downloaded got corrupted?

Sep 08 '25 15:09 wbruna

thank u for ur answer, downloaded ur version. still no luck..

Sep 08 '25 17:09 Amin456789

Could you run that command with -v, and paste the result here (including the command itself)?

Also, what is the last version that do work for you?

Sep 08 '25 17:09 wbruna

this is the last version that works = https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-b1fc16b

after that everything else is not working.

the command that i work which works for the link above

PS C:\Users\Amin\Desktop\AI\Koboldcpp> ./sd.exe -v -m plcm.gguf -H 320 -W 320 -o ./pic.png --steps 4 -p "a hamster" Option: n_threads: 4 mode: txt2img model_path: plcm.gguf wtype: unspecified clip_l_path: clip_g_path: t5xxl_path: diffusion_model_path: vae_path: taesd_path: esrgan_path: controlnet_path: embeddings_path: stacked_id_embeddings_path: input_id_images_path: style ratio: 20.00 normalize input image : false output_path: ./pic.png init_img: mask_img: control_image: ref_images_paths: clip on cpu: false controlnet cpu: false vae decoder on cpu:false diffusion flash attention:false strength(control): 0.90 prompt: a hamster negative_prompt: min_cfg: 1.00 cfg_scale: 7.00 slg_scale: 0.00 guidance: 3.50 eta: 0.00 clip_skip: -1 width: 320 height: 320 sample_method: euler_a schedule: default sample_steps: 4 strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 System Info: SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:190 - Using CPU backend [INFO ] stable-diffusion.cpp:199 - loading model from 'plcm.gguf' [INFO ] model.cpp:905 - load plcm.gguf using gguf format [DEBUG] model.cpp:922 - init from 'plcm.gguf' [INFO ] stable-diffusion.cpp:246 - Version: SD 1.x [INFO ] stable-diffusion.cpp:279 - Weight type: q4_0 [INFO ] stable-diffusion.cpp:280 - Conditioner weight type: q4_0 [INFO ] stable-diffusion.cpp:281 - Diffusion model weight type: q4_0 [INFO ] stable-diffusion.cpp:282 - VAE weight type: q4_0 [DEBUG] stable-diffusion.cpp:284 - ggml tensor size = 400 bytes [DEBUG] clip.hpp:171 - vocab size: 49408 [DEBUG] clip.hpp:182 - trigger word img already in vocab [DEBUG] ggml_extend.hpp:1190 - clip params backend buffer size = 191.00 MB(RAM) (196 tensors) [DEBUG] ggml_extend.hpp:1190 - unet params backend buffer size = 1272.85 MB(RAM) (686 tensors) [DEBUG] ggml_extend.hpp:1190 - vae params backend buffer size = 94.47 MB(RAM) (140 tensors) [DEBUG] stable-diffusion.cpp:432 - loading weights [DEBUG] model.cpp:1727 - loading tensors from plcm.gguf [INFO ] model.cpp:1897 - unknown tensor 'cond_stage_model.logit_scale | f16 | 1 [1, 1, 1, 1, 1]' in model file [INFO ] model.cpp:1897 - unknown tensor 'cond_stage_model.text_projection | q4_0 | 2 [768, 768, 1, 1, 1]' in model file |====> | 103/1132 - 1000.00it/ |=====> | 117/1132 - 1000.00it/ |=====> | 127/1132 - 1000.00it/ |=====> | 133/1132 - 1000.00it/ |======> | 149/1132 - 1000.00it/ |=======> | 159/1132 - 1000.00it/ |=======> | 161/1132 - 1000.00it/ |=======> | 166/1132 - 1000.00it/ |========> | 182/1132 - 1000.00it/ |========> | 187/1132 - 1000.00it/ |========> | 191/1132 - 1000.00it/ |========> | 197/1132 - 1000.00it/ |=========> | 213/1132 - 1000.00it/ |=========> | 221/1132 - 1000.00it/ |==========> | 228/1132 - 1000.00it/ |==========> | 236/1132 - 1000.00it/ |==========> | 246/1132 - 1000.00it/ |===========> | 250/1132 - 1000.00it/ |===========> | 251/1132 - 1000.00it/ |===========> | 252/1132 - 1000.00it/ |===========> | 257/1132 - 1000.00it/ |===========> | 263/1132 - 1000.00it/ |============> | 279/1132 - 1000.00it/ |============> | 292/1132 - 1000.00it/ |=============> | 301/1132 - 1000.00it/ |=============> | 315/1132 - 1000.00it/ |==============> | 334/1132 - 1000.00it/ |===================> | 444/1132 - 1000.00it/ |===================> | 451/1132 - 1000.00it/ |====================> | 453/1132 - 1000.00it/ |====================> | 455/1132 - 1000.00it/ |====================> | 460/1132 - 1000.00it/ |====================> | 463/1132 - 1000.00it/ |====================> | 465/1132 - 1000.00it/ |====================> | 467/1132 - 1000.00it/ |====================> | 469/1132 - 1000.00it/ |====================> | 470/1132 - 1000.00it/ |====================> | 472/1132 - 1000.00it/ |====================> | 474/1132 - 1000.00it/ |=====================> | 485/1132 - 1000.00it/ |=====================> | 488/1132 - 1000.00it/ |=====================> | 491/1132 - 1000.00it/ |======================> | 501/1132 - 1000.00it/ |======================> | 508/1132 - 1000.00it/ |======================> | 511/1132 - 1000.00it/ |======================> | 518/1132 - 1000.00it/ |=======================> | 522/1132 - 1000.00it/ |=======================> | 525/1132 - 1000.00it/ |=======================> | 526/1132 - 1000.00it/ |=======================> | 528/1132 - 1000.00it/ |=======================> | 538/1132 - 1000.00it/ |========================> | 553/1132 - 1000.00it/ |========================> | 556/1132 - 1000.00it/ |========================> | 563/1132 - 1000.00it/ |=========================> | 577/1132 - 1000.00it/ |=========================> | 580/1132 - 1000.00it/ |=========================> | 584/1132 - 1000.00it/ |==========================> | 593/1132 - 1000.00it/ |==========================> | 604/1132 - 1000.00it/ |==========================> | 609/1132 - 1000.00it/ |==========================> | 611/1132 - 1000.00it/ |===========================> | 622/1132 - 1000.00it/ |===========================> | 629/1132 - 1000.00it/ |============================> | 643/1132 - 1000.00it/ |============================> | 647/1132 - 1000.00it/ |============================> | 655/1132 - 1000.00it/ |=============================> | 659/1132 - 1000.00it/ |=============================> | 665/1132 - 1000.00it/ |=============================> | 669/1132 - 1000.00it/ |=============================> | 671/1132 - 1000.00it/ |==============================> | 685/1132 - 1000.00it/ |==============================> | 691/1132 - 1000.00it/ |===============================> | 705/1132 - 1000.00it/ |===============================> | 709/1132 - 1000.00it/ |===============================> | 721/1132 - 1000.00it/ |================================> | 726/1132 - 1000.00it/ |================================> | 734/1132 - 1000.00it/ |================================> | 738/1132 - 1000.00it/ |================================> | 741/1132 - 1000.00it/ |================================> | 747/1132 - 1000.00it/ |=================================> | 751/1132 - 1000.00it/ |=================================> | 761/1132 - 1000.00it/ |=================================> | 764/1132 - 1000.00it/ |=================================> | 767/1132 - 1000.00it/ |=================================> | 769/1132 - 1000.00it/ |==================================> | 782/1132 - 1000.00it/ |==================================> | 784/1132 - 1000.00it/ |==================================> | 785/1132 - 1000.00it/ |==================================> | 786/1132 - 1000.00it/ |==================================> | 787/1132 - 1000.00it/ |==================================> | 790/1132 - 1000.00it/ |==================================> | 792/1132 - 1000.00it/ |===================================> | 794/1132 - 1000.00it/ |===================================> | 796/1132 - 1000.00it/ |===================================> | 807/1132 - 1000.00it/ |===================================> | 808/1132 - 1000.00it/ |===================================> | 809/1132 - 1000.00it/ |====================================> | 820/1132 - 1000.00it/ |====================================> | 822/1132 - 1000.00it/ |====================================> | 824/1132 - 1000.00it/ |====================================> | 828/1132 - 1000.00it/ |====================================> | 830/1132 - 1000.00it/ |====================================> | 832/1132 - 1000.00it/ |====================================> | 834/1132 - 1000.00it/ |=====================================> | 841/1132 - 1000.00it/ |=====================================> | 848/1132 - 1000.00it/ |=====================================> | 852/1132 - 1000.00it/ |=====================================> | 857/1132 - 1000.00it/ |======================================> | 863/1132 - 1000.00it/ |======================================> | 865/1132 - 1000.00it/ |======================================> | 871/1132 - 1000.00it/ |=======================================> | 883/1132 - 1000.00it/ |=======================================> | 899/1132 - 1000.00it/ |=======================================> | 905/1132 - 1000.00it/ |========================================> | 907/1132 - 1000.00it/ |========================================> | 910/1132 - 1000.00it/ |=========================================> | 934/1132 - 1000.00it/ |=========================================> | 943/1132 - 1000.00it/ |=========================================> | 949/1132 - 1000.00it/ |==========================================> | 965/1132 - 1000.00it/ |==========================================> | 967/1132 - 1000.00it/ |==========================================> | 971/1132 - 1000.00it/ |===========================================> | 982/1132 - 1000.00it/ |===========================================> | 988/1132 - 1000.00it/ |===========================================> | 996/1132 - 1000.00it/ |============================================> | 997/1132 - 1000.00it/ |============================================> | 998/1132 - 1000.00it/ |============================================> | 999/1132 - 1000.00it/ |============================================> | 1000/1132 - 1000.00it |============================================> | 1001/1132 - 500.00it/ |============================================> | 1003/1132 - 500.00it/ |============================================> | 1005/1132 - 1000.00it |============================================> | 1006/1132 - 125.00it/ |============================================> | 1007/1132 - 1000.00it |============================================> | 1013/1132 - 1000.00it |============================================> | 1014/1132 - 500.00it/ |=============================================> | 1025/1132 - 1000.00it |=============================================> | 1026/1132 - 1000.00it |=============================================> | 1027/1132 - 1000.00it |=============================================> | 1031/1132 - 100.00it/ |=============================================> | 1032/1132 - 1000.00it |=============================================> | 1034/1132 - 333.33it/ |=============================================> | 1035/1132 - 500.00it/ |=============================================> | 1036/1132 - 1000.00it |=============================================> | 1038/1132 - 500.00it/ |=============================================> | 1039/1132 - 1000.00it |=============================================> | 1040/1132 - 333.33it/ |=============================================> | 1041/1132 - 1000.00it |==============================================> | 1044/1132 - 125.00it/ |==============================================> | 1046/1132 - 1000.00it |==============================================> | 1052/1132 - 500.00it/ |==============================================> | 1053/1132 - 1000.00it |==============================================> | 1062/1132 - 100.00it/ |===============================================> | 1065/1132 - 1000.00it |===============================================> | 1066/1132 - 166.67it/ |===============================================> | 1068/1132 - 142.86it/ |===============================================> | 1069/1132 - 500.00it/ |===============================================> | 1071/1132 - 500.00it/ |===============================================> | 1072/1132 - 1000.00it |===============================================> | 1073/1132 - 500.00it/ |===============================================> | 1074/1132 - 200.00it/ |===============================================> | 1075/1132 - 1000.00it |===============================================> | 1076/1132 - 500.00it/ |===============================================> | 1077/1132 - 1000.00it |===============================================> | 1078/1132 - 333.33it/ |===============================================> | 1081/1132 - 1000.00it |===============================================> | 1082/1132 - 142.86it/ |================================================> | 1087/1132 - 1000.00it |================================================> | 1092/1132 - 333.33it/ |================================================> | 1094/1132 - 1000.00it |================================================> | 1097/1132 - 500.00it/ |================================================> | 1100/1132 - 111.11it/ |================================================> | 1102/1132 - 333.33it/ |================================================> | 1103/1132 - 1000.00it |================================================> | 1106/1132 - 500.00it/ |================================================> | 1108/1132 - 500.00it/ |==================================================| 1111/1132 - 1000.00it |==================================================| 1113/1132 - 1000.00it |==================================================| 1114/1132 - 1000.00it |==================================================| 1116/1132 - 1000.00it |==================================================| 1118/1132 - 500.00it/ |==================================================| 1120/1132 - 250.00it/ |==================================================| 1121/1132 - 1000.00it |==================================================| 1122/1132 - 1000.00it |==================================================| 1125/1132 - 1000.00it |==================================================| 1130/1132 - 166.67it/ |==================================================| 1132/1132 - 31.25it/ [INFO ] stable-diffusion.cpp:531 - total params memory size = 1558.32MB (VRAM 0.00MB, RAM 1558.32MB): clip 191.00MB(RAM), unet 1272.85MB(RAM), vae 94.47MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:535 - loading model from 'plcm.gguf' completed, taking 34.27s [INFO ] stable-diffusion.cpp:569 - running in eps-prediction mode [DEBUG] stable-diffusion.cpp:613 - finished loaded file [DEBUG] stable-diffusion.cpp:1574 - txt2img 320x320 [DEBUG] stable-diffusion.cpp:1266 - prompt after extract and remove lora: "a hamster" [INFO ] stable-diffusion.cpp:703 - Attempting to apply 0 LoRAs [INFO ] stable-diffusion.cpp:1271 - apply_loras completed, taking 0.01s [DEBUG] conditioner.hpp:358 - parse 'a hamster' to [['a hamster', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] ggml_extend.hpp:1141 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] conditioner.hpp:486 - computing condition graph completed, taking 530 ms [DEBUG] conditioner.hpp:358 - parse '' to [['', 1], ] [DEBUG] clip.hpp:311 - token length: 77 [DEBUG] ggml_extend.hpp:1141 - clip compute buffer size: 1.40 MB(RAM) [DEBUG] conditioner.hpp:486 - computing condition graph completed, taking 296 ms [INFO ] stable-diffusion.cpp:1404 - get_learned_condition completed, taking 928 ms [INFO ] stable-diffusion.cpp:1427 - sampling using Euler A method [INFO ] stable-diffusion.cpp:1464 - generating image: 1/1 - seed 42 [DEBUG] stable-diffusion.cpp:822 - Sample [DEBUG] ggml_extend.hpp:1141 - unet compute buffer size: 101.29 MB(RAM) |==================================================| 4/4 - 18.24s/it [INFO ] stable-diffusion.cpp:1504 - sampling completed, taking 73.46s [INFO ] stable-diffusion.cpp:1512 - generating 1 latent images completed, taking 73.61s [INFO ] stable-diffusion.cpp:1515 - decoding 1 latents [DEBUG] ggml_extend.hpp:1141 - vae compute buffer size: 650.00 MB(RAM) [DEBUG] stable-diffusion.cpp:1108 - computing vae [mode: DECODE] graph completed, taking 36.33s [INFO ] stable-diffusion.cpp:1525 - latent 1 decoded, taking 36.33s [INFO ] stable-diffusion.cpp:1529 - decode_first_stage completed, taking 36.33s [INFO ] stable-diffusion.cpp:1655 - txt2img completed in 111.00s save result PNG image to './pic.png'

=========

same command with new builds =

PS C:\Users\Amin\Desktop\AI\Koboldcpp> ./sd.exe -v -m plcm.gguf -H 320 -W 320 -o ./pic.png --steps 4 -p "a hamster" Option: n_threads: 4 mode: img_gen model_path: plcm.gguf wtype: unspecified clip_l_path: clip_g_path: clip_vision_path: t5xxl_path: diffusion_model_path: high_noise_diffusion_model_path: vae_path: taesd_path: esrgan_path: control_net_path: embedding_dir: stacked_id_embed_dir: input_id_images_path: style ratio: 20.00 normalize input image: false output_path: ./pic.png init_image_path: end_image_path: mask_image_path: control_image_path: ref_images_paths: increase_ref_index: false offload_params_to_cpu: false clip_on_cpu: false control_net_cpu: false vae_on_cpu: false diffusion flash attention: false diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: a hamster negative_prompt: clip_skip: -1 width: 320 height: 320 sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler_a, sample_steps: 4, eta: 0.00) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler_a, sample_steps: -1, eta: 0.00) moe_boundary: 0.875 flow_shift: inf strength(img2img): 0.75 rng: cuda seed: 42 batch_count: 1 vae_tiling: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 fps: 16 System Info: SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:174 - Using CPU backend PS C:\Users\Amin\Desktop\AI\Koboldcpp>

Sep 08 '25 18:09 Amin456789

So this release is already crashing in that way: https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-0d8b39f ? Please check it carefully, because I can't see how those code changes could cause such a behavior (unless master-0d8b39f is working, but master-ecf5db9 isn't?).

I could also try that exact model, since it looks like the crash happens right before it is loaded. I see it's a q4_0 quant; did you convert it yourself? Could you point me to where I could download it, or the corresponding .safetensors file?

Sep 08 '25 20:09 wbruna

No please check the link instead of the number

Newer builds are not working

It doesn't matter which model. I tried wan and the other stuff too. It's not the model I use. It just crash with any model even chroma. The gguf I have works great with local diffusion on Android and koboldpp too

Sep 08 '25 20:09 Amin456789

Edited my -v post to know what I mean

Sep 08 '25 20:09 Amin456789

The build that has added opencl is all started the crashes. Maybe that is the culprit

Sep 08 '25 21:09 Amin456789

No please check the link instead of the number

It is master-b1fc16b , isn't it? That's why I asked about the release master-0d8b39f : that's the first release after master-b1fc16b (the order on the release page is wrong; I'm following the code history).

The build that has added opencl is all started the crashes. Maybe that is the culprit

master-d42fd59 , which added OpenCL support, is one after master-0d8b39f . It is more likely to cause a crash, because the ggml library was updated too; but please, just confirm: is master-0d8b39f crashing, or not?

Following the code order, we have:

b1fc16b504297a03aeb2f0678078dfcad322243e - resetting clip_skip - definitely works
539b5b9374b1289ae040f2b8dc83b26dc7372140 - musa docker build (no binary release)
0d8b39f0ba202e407cc6908dac371ff005bc8ab0 - avoid crash on sdxl loras - ???
d42fd59464fef92934a082bab24855fd6f6177cb - OpenCL - crashes
23de7fc44a9a93beff02ff8382307baacbffff1e - build warnings on Linux - crashes
ea46fd6948799c517e0346af2eae11ef0ddb8db4 - zero-initialize output of tiling (no binary release)
ecf5db97aeea4630c547f45e2262b2dc1f866a22 - fix windows build - crashes

It doesn't matter which model. I tried wan and the other stuff too. It's not the model I use. It just crash with any model even chroma. The gguf I have works great with local diffusion on Android and koboldpp too

Just a wild guess: could you check if the avx binaries crash in the same way?

Sep 08 '25 21:09 wbruna

yes master-0d8b39f works great with no problem, after that everything else just crashes, so i think opencl is doing something, can we disable it or have an option force cpu maybe for test with latest version please?

yes i tested every build from all avx and no avx too, its all crashes

Sep 09 '25 07:09 Amin456789

I don't think OpenCL support itself could cause this, because it's a build-time option, disabled by default (and it'd appear as "Using OpenCL backend" if it were enabled by mistake: d42fd59 ).

But the backend initialization is done by ggml, and that same revision updated the ggml submodule, which had a ton of pending changes: https://github.com/ggerganov/ggml/compare/ff9052988b76e137bcf92bb335733933ca196ac0...9e4bee1c5afc2d677a5b32ecb90cbdb483e81fff

So one possibility would be a change on ggml. Except... you mentioned Koboldcpp works. Does recent Koboldcpp releases work for you? Because it also uses the same ggml library, updating it far more frequently actually.

Sep 09 '25 12:09 wbruna

By the way, any chance you are running sd.exe with an unmatched stable-diffusion.dll (from another release, for instance)? Even that shouldn't cause problems between these two releases, but...

Sep 09 '25 12:09 wbruna

yes using koboldcpp latest version with no problem at all, though kobold makes u to use cpu if u want by force so it helps i think. like it bypass everything to go for the cpu. @lostruins please see if u can help

no im using fresh install always, deleted dlls and exes and extracted new one

thank u for ur time, it means a lot, i wanted to give up and close this topic but maybe it is better to be solved for future users so it wont be a problem

Sep 09 '25 13:09 Amin456789

FYI I have merged wbruna's PR with the latest SD.cpp changes into concedo_experimental branch. Its not released yet but do let me know if anything breaks

Sep 27 '25 09:09 LostRuins