Please add teacache support for wan models
Will you be adding tea cache support for wan because it helps speed up generation it takes forever to gen a video on my device I really need tea cache
There is wip easycache https://github.com/leejet/stable-diffusion.cpp/pull/940
There is wip easycache https://github.com/leejet/stable-diffusion.cpp/pull/940
How do I use it I tried it with sd.cpp but it said it ctx not supported how can I use it with wan
I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.
[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
|==================================================| 48/48 - 2.08s/it
[INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 13/48 steps (1.37x estimated speedup)
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 100.07s
VS
[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9
[INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM)
|==================================================| 48/48 - 2.40s/it
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 115.43s
(It says 13/48, but it is more like 13/96 -> x1.15)
no cache
https://github.com/user-attachments/assets/0047e5c3-46ed-4b11-af42-46bc05f88523
easycache 0.025,0.15,0.95
https://github.com/user-attachments/assets/0fca90ef-3ed7-4b24-a39b-8c0cb8ffed6e
I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.
[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9 [INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95 [INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s [DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM) |==================================================| 48/48 - 2.08s/it [INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 13/48 steps (1.37x estimated speedup) [INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 100.07sVS
[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9 [INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s [DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM) |==================================================| 48/48 - 2.40s/it [INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 115.43s(It says 13/48, but it is more like 13/96 -> x1.15)
no cache
https://github.com/user-attachments/assets/0047e5c3-46ed-4b11-af42-46bc05f88523
easycache 0.025,0.15,0.95
https://github.com/user-attachments/assets/0fca90ef-3ed7-4b24-a39b-8c0cb8ffed6e
Test it with wan 2.1 1 3B please I need to know if it works I can't use wan 2.2 not enough memory I just found a small wan 2.1 1.3b Q6 model but I tried everything and still can't get good results I need alit of steps but I don't wanna have to wait hours for just to have an bad results what are your config for wan 1.3b i can only go from 256x256-448x320 or 384x384
I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.
[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9 [INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95 [INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s [DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM) |==================================================| 48/48 - 2.08s/it [INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 13/48 steps (1.37x estimated speedup) [INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 100.07sVS
[DEBUG] stable-diffusion.cpp:3649 - sample 36x22x9 [INFO ] ggml_extend.hpp:1696 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (CUDA0), taking 0.45s [DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 193.34 MB(VRAM) |==================================================| 48/48 - 2.40s/it [INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 115.43s(It says 13/48, but it is more like 13/96 -> x1.15)
no cache
https://github.com/user-attachments/assets/0047e5c3-46ed-4b11-af42-46bc05f88523
easycache 0.025,0.15,0.95
https://github.com/user-attachments/assets/0fca90ef-3ed7-4b24-a39b-8c0cb8ffed6e
Hey did u add it to the sdcpp or we gota wait for the merge?
I applied the patch (now with merging conflicts) and I can use it with Wan2.2-TI2V-5B.
Hey did u add it to the sdcpp or we gota wait for the merge?
Yes, the PR I linked is not ready yet.
Test it with wan 2.1 1 3B please I need to know if it works I can't use wan 2.2 not enough memory I just found a small wan 2.1 1.3b Q6 model but I tried everything and still can't get good results I need alit of steps but I don't wanna have to wait hours for just to have an bad results what are your config for wan 1.3b i can only go from 256x256-448x320 or 384x384
Wan 2.1 1.3B q8_0 20steps 560x320 33frames
https://github.com/user-attachments/assets/6e70de96-bb23-46ee-a2f2-2ee2be5d9c84
[DEBUG] stable-diffusion.cpp:3649 - sample 70x40x9
[INFO ] stable-diffusion.cpp:1597 - EasyCache enabled - threshold: 0.025, start_percent: 0.15, end_percent: 0.95
[INFO ] ggml_extend.hpp:1696 - Wan2.1-T2V-1.3B offload params (1473.40 MB, 825 tensors) to runtime backend (CUDA0), taking 0.14s
[DEBUG] ggml_extend.hpp:1598 - Wan2.1-T2V-1.3B compute buffer size: 606.62 MB(VRAM)
|==================================================| 20/20 - 3.86s/it
[INFO ] stable-diffusion.cpp:1884 - EasyCache skipped 2/20 steps (1.11x estimated speedup)
[INFO ] stable-diffusion.cpp:3677 - sampling completed, taking 77.15s
40/(40-2) -> 1.05x speedup
More steps usually lead to more skips.
On interesting thing is that this 1.3B model is as fast as the other 5B model I tested. Probably because the Wan2.2 VAE compresses width and height by another factor of 2. Sure, the diffusion step takes way less memory, but VAE still consumes heaps.
$ result/bin/sd -M vid_gen --diffusion-model models/wan/wan2.1_t2v_1.3B-q8_0.gguf --vae models/wan/Wan2.1_VAE-f16.gguf --t5xxl models/wan/umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat strolling down a wooden plank. everything in focus and sharp. stable camera." --cfg-scale 6 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止, 整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 560 -H 320 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 --steps 20 -s 42 --easycache 0.025,0.15,0.95
I can't wait this with the diffsynth-studio aesthetics v1 Lora will make amazing videos 🥹