stable-diffusion.cpp Unable to inference using Segmind Tiny SD model

Hi there,

I am trying to run Segmind's Distilled Diffusion model (segmind/tiny-sd) from Hugging Face on my machine with the following specifications:

Processor: 12th Gen Intel(R) Core(TM) i5-1235U 1.30 GHz
Installed RAM: 16.0 GB (15.7 GB usable)
System Type: 64-bit operating system, x64-based processor

I successfully converted the model to a single checkpoint file (.safetensors, .ckpt) as shown below:

Then, I used the convert function in stable-diffusion.cpp to convert the .safetensors file to the GGUF format. While the conversion completed without any errors, the resulting GGUF file is significantly smaller than the original model, as shown here:

When I attempt to run inference with the converted GGUF file, I encounter the following error:

It appears that the issue may lie in the conversion process to GGUF, possibly due to the fact that the model in question is a distilled version (Tiny-SD). I am wondering if anyone has worked with distilled models in this context and found a fix for this issue.

Any insights or suggestions would be greatly appreciated.

Thank you for your time and assistance!

Feb 26 '25 10:02 Rehaman1429

You don't need to convert the .safetensors to gguf, sdcpp supports loading safetensors files directly. You could try loading it and see if you get the same error? (I'm pretty sure you would)

I think there are two problems here:

the model file doesn't contain the text encoder, which would explain why It's unable to detect the version. If it was included, I believe it would be detected as a sd1.x model.
Even if the text encoder was included, there would be another problem causing a crash when loading, because It would try loading it as a standard sd1.x model, even though the architecture is actually fairly different.

So to "fix" this you would first need to make sure the text_encoder is included in the model file, and then you'd still have to add support for this model (I think this would require making changes to model.h, model.cpp, and unet .cpp mostly).

Feb 26 '25 16:02 stduhpf

Thanks for your reply @stduhpf ,

The tiny-sd model(https://huggingface.co/segmind/tiny-sd/tree/main) has text encoder:

And hence the error is not because of the absence of text encoder, it would be something else.

Feb 27 '25 06:02 Rehaman1429

Yes, there is a text encoder on the hf repo, I already knew that (and that's how I knew it should be detected as a sd1.x model), I was only saying the text encoder might be missing from the safetensors file. But it's also possible it has been included under a different name than the one expected. Anyways this model has an unsupported architecture, so it won't work regardless.

Feb 27 '25 09:02 stduhpf

@stduhpf what is the meaning of unsupported architecture? , is there any alternative is there to run?

Mar 27 '25 10:03 Naveen7217

@Naveen7217 Sdcpp expects very specific model architectures, with the right number of layers and tensor sizes. The Tiny SD model has a different architecture that doesn't match SD1.x, SD2.x or SDXL architectures. So the model cannot be loaded. The alternative is to use something else than sdcpp to run it, or to try and add support for this specific model with a PR.

Mar 27 '25 12:03 stduhpf

Another thing that might be worth trying is something like https://github.com/leejet/stable-diffusion.cpp/pull/490 but for unet models.

Mar 27 '25 13:03 stduhpf

I have just commited my work on that kind of TINY SD 1.x models:

https://github.com/akleine/stable-diffusion.cpp/tree/tiny-unet-in-sd-cpp

A Pull-Request will be added later. Perhaps I'll add some DOC about that and about creating some ckpt files. At all this opens up some new perspectives for us on low-power hardware like Raspberry Pi and some Android phones.

Some output: [INFO ] model.cpp:1016 - load models/bk-sdm-tiny.ckpt using checkpoint format [DEBUG] model.cpp:1593 - init from 'models/bk-sdm-tiny.ckpt' [INFO ] stable-diffusion.cpp:244 - Version: SD 1.x tiny UNet <---------------- NEW .... [DEBUG] ggml_extend.hpp:1193 - unet params backend buffer size = 617.20 MB(RAM) (358 tensors) .... [DEBUG] stable-diffusion.cpp:1879 - generate_image 512x512 [INFO ] stable-diffusion.cpp:2009 - TXT2IMG ...... One step needs here approximately 32.37s on a RaspberryPi 5 - and the picture quallity is quite okay.

For comparisation: SD1.5 needs here: [DEBUG] ggml_extend.hpp:1193 - unet params backend buffer size = 2155.33 MB(RAM) (686 tensors)

At last: here you can read more about that kind of tiny SD models https://arxiv.org/pdf/2305.15798.pdf

Jul 28 '25 09:07 akleine