stable-diffusion.cpp [WIP] Add Deepcache

This PR is currently in progress and far from complete. It adds DeepCache, a method that can be applied to U-Net architectures to skip certain blocks and reuse them in later steps in order to save compute time.

I have been inspired by this ComfyUI implementation.

It adds --deepcache interval,depth,start,stop arguments.

Currently, it's not working well and I can't figure out why or how to achieve better results. I have been debugging the cache step and counter logic for a week, but the issue seems to be more subtle than that.

Command example:

./build/bin/sd -m ../models/realisticVisionV60B1_v51HyperVAE.safetensors -v -p "cute cat" --cfg-scale 2.5 --steps 8 --deepcache 2,3,0,8

w/o deepcache	`--deepcache 2,3,0,8`	`--deepcache 3,3,0,8`

If someone could help by taking a look or continue the work, I would be grateful. Otherwise, I don't think I'll spend more time on it.

Jun 18 '25 12:06 rmatif

I'm very interested in this PR; I wish I had time to test DeepCache in Comfy UI and compare the results with your PR.

Jul 04 '25 18:07 FSSRepo

@FSSRepo Thanks for your interest! Here's a comparison with ComfyUI, using the same model and parameters as above.

w/o deepcache	`interval = 2, depth = 3, start = 0, stop = 8`	`interval = 3, depth = 3, start = 0, stop = 8`

The results are so much better in ComfyUI compared to what I’m getting. My implementation doesn’t seem to work without CFG, which is really odd since DeepCache is supposed to be CFG-agnostic. I’m definitely doing something wrong. I'd love to continue working on this, but I’ve run out of ideas. It would be great if you could take a look and share any feedback!

Jul 05 '25 12:07 rmatif

My implementation doesn’t seem to work without CFG, which is really odd since DeepCache is supposed to be CFG-agnostic.

My guess is that it's sharing the same cache between uncond and conditioned pass and it's probably not supposed to.

Jul 05 '25 12:07 stduhpf

My implementation doesn’t seem to work without CFG, which is really odd since DeepCache is supposed to be CFG-agnostic.

My guess is that it's sharing the same cache between uncond and conditioned pass and it's probably not supposed to.

I tried to create a separate cache for conditional and unconditional passes, but it broke things even more. In any case, I think we should fix things with CFG first before addressing the CFG-free issue, don't think those are related

Jul 06 '25 08:07 rmatif

What is CFG?

According to my understanding, it's when we pass the --cfg-scale parameter. Why do they refer to it as something that's missing in this project?

Or is it a deepcache configuration?

Jul 12 '25 15:07 FSSRepo

What is CFG?

CFG means Classifier-Free guidance. It's basically a way to change how much of an effect the prompt has for conditional generation by linearly extrapolating from the conditioned prediction away from the prediction without text conditioning (or with a negative prompt). So it needs 2 forward passes at each step: 1 with the positive prompt, and 1 with empty/negative prompt.

Jul 12 '25 16:07 stduhpf

What is CFG?

According to my understanding, it's when we pass the --cfg-scale parameter. Why do they refer to it as something that's missing in this project?

Or is it a deepcache configuration?

It's just the --cfg-scale. When you try to run inference with a CFG of 1, the results are significantly worse, almost garbage. It seems that the more steps it takes, the further off the output gets, as if some error is accumulating at each step.

I did try separating the cache between the conditional and unconditional passes, but that didn’t help and in fact, it broke the case where we run with CFG > 1. From my understanding, DeepCache operates at a higher level and shouldn't be affected by this conditional/unconditional distinction stuff. Something is seriously wrong here, but I can't quite put my finger on it

EDIT: I may have wrongly assumed that you're familiar with the concept of CFG, but @stduhpf already explained it well. Basically, during inference, you're doing:

final_prediction = prediction_unconditional + w * (prediction_conditional - prediction_unconditional)

When w = 1, you're effectively running only the conditional pass. That’s useful because it means you can double your inference speed, and distilled models support this approach. However, you do trade off some prompt fidelity when doing so.

I recently read a paper that concluded CFG might actually be useless. It only appears to work because we end up using twice the compute

Jul 12 '25 16:07 rmatif