stable-diffusion.cpp Please add teacache support for wan models

Will you be adding tea cache support for wan because it helps speed up generation it takes forever to gen a video on my device I really need tea cache

Nov 06 '25 12:11 KintCark

There is wip easycache https://github.com/leejet/stable-diffusion.cpp/pull/940

Nov 06 '25 12:11 Green-Sky

There is wip easycache https://github.com/leejet/stable-diffusion.cpp/pull/940

How do I use it I tried it with sd.cpp but it said it ctx not supported how can I use it with wan

Nov 09 '25 23:11 KintCark

I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.

[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
  |==================================================| 48/48 - 2.08s/it
[INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 13/48 steps (1.37x estimated speedup)
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 100.07s

VS

[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
  |==================================================| 48/48 - 2.40s/it
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 115.43s

(It says 13/48, but it is more like 13/96 -> x1.15)

no cache

https://github.com/user-attachments/assets/0047e5c3-46ed-4b11-af42-46bc05f88523

easycache 0.025,0.15,0.95

https://github.com/user-attachments/assets/0fca90ef-3ed7-4b24-a39b-8c0cb8ffed6e

Nov 10 '25 11:11 Green-Sky

I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.

[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
  |==================================================| 48/48 - 2.08s/it
[INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 13/48 steps (1.37x estimated speedup)
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 100.07s

VS

[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
  |==================================================| 48/48 - 2.40s/it
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 115.43s

(It says 13/48, but it is more like 13/96 -> x1.15)

no cache

https://github.com/user-attachments/assets/0047e5c3-46ed-4b11-af42-46bc05f88523

easycache 0.025,0.15,0.95

https://github.com/user-attachments/assets/0fca90ef-3ed7-4b24-a39b-8c0cb8ffed6e

Test it with wan 2.1 1 3B please I need to know if it works I can't use wan 2.2 not enough memory I just found a small wan 2.1 1.3b Q6 model but I tried everything and still can't get good results I need alit of steps but I don't wanna have to wait hours for just to have an bad results what are your config for wan 1.3b i can only go from 256x256-448x320 or 384x384

Nov 10 '25 12:11 KintCark

I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.

[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
  |==================================================| 48/48 - 2.08s/it
[INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 13/48 steps (1.37x estimated speedup)
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 100.07s

VS

[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
  |==================================================| 48/48 - 2.40s/it
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 115.43s

(It says 13/48, but it is more like 13/96 -> x1.15)

no cache

https://github.com/user-attachments/assets/0047e5c3-46ed-4b11-af42-46bc05f88523

easycache 0.025,0.15,0.95

https://github.com/user-attachments/assets/0fca90ef-3ed7-4b24-a39b-8c0cb8ffed6e

Hey did u add it to the sdcpp or we gota wait for the merge?

Nov 10 '25 12:11 KintCark

I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.

Hey did u add it to the sdcpp or we gota wait for the merge?

Yes, the PR I linked is not ready yet.

Nov 10 '25 14:11 Green-Sky

Test it with wan 2.1 1 3B please I need to know if it works I can't use wan 2.2 not enough memory I just found a small wan 2.1 1.3b Q6 model but I tried everything and still can't get good results I need alit of steps but I don't wanna have to wait hours for just to have an bad results what are your config for wan 1.3b i can only go from 256x256-448x320 or 384x384

Wan 2.1 1.3B q8_0 20steps 560x320 33frames

https://github.com/user-attachments/assets/6e70de96-bb23-46ee-a2f2-2ee2be5d9c84

[DEBUG] stable-diffusion.cpp:3649 - sample 70x40x9
[INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95
[INFO ] ggml_extend.hpp:1696 - Wan2.1-T2V-1.3B offload params (1473.40 MB, 825 tensors) to runtime backend (CUDA0), taking 0.14s
[DEBUG] ggml_extend.hpp:1598 - Wan2.1-T2V-1.3B compute buffer size: 606.62 MB(VRAM)
  |==================================================| 20/20 - 3.86s/it
[INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 2/20 steps (1.11x estimated speedup)
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 77.15s

40/(40-2) -> 1.05x speedup

More steps usually lead to more skips.

On interesting thing is that this 1.3B model is as fast as the other 5B model I tested. Probably because the Wan2.2 VAE compresses width and height by another factor of 2. Sure, the diffusion step takes way less memory, but VAE still consumes heaps.

$ result/bin/sd -M vid_gen --diffusion-model models/wan/wan2.1_t2v_1.3B-q8_0.gguf --vae models/wan/Wan2.1_VAE-f16.gguf --t5xxl models/wan/umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat strolling down a wooden plank. everything in focus and sharp. stable camera." --cfg-scale 6 --sampling-method euler -v -n "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止， 整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" -W 560 -H 320 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 --steps 20 -s 42 --easycache 0.025,0.15,0.95

Nov 10 '25 15:11 Green-Sky

I can't wait this with the diffsynth-studio aesthetics v1 Lora will make amazing videos 🥹

Nov 10 '25 21:11 KintCark