stable-diffusion.cpp video support

(rewriting sloppy request) I was wondering if video support can be added?

At first I came up with lucidrain's video-diffusion-pytorch https://github.com/lucidrains/video-diffusion-pytorch

But, after some research it seems like zeroscope might be the right model to use https://huggingface.co/cerspense/zeroscope_v2_576w

Aug 19 '23 21:08 ehartford

This model appears to be significantly different from stable-diffusion, no plans to support it currently. If there's time in the future, I will consider providing support for it.

Aug 20 '23 04:08 leejet

I didn't necessarily mean this specific model, more "video" in general.

I think zeroscope would probably be the right place to start.

Sorry for being sloppy.

https://huggingface.co/cerspense/zeroscope_v2_576w

Aug 20 '23 11:08 ehartford

It looks like this needs some work, and there are no plans to support it currently. Maybe in the future?

Aug 21 '23 00:08 leejet

stable video diffusion (SVD) models from stability where released!

SVD was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware deflickering decoder

https://stability.ai/news/stable-video-diffusion-open-ai-video-model https://huggingface.co/stabilityai/stable-video-diffusion-img2vid / https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt

Nov 21 '23 20:11 Green-Sky

The SVD demo looks quite good. I'll make time in the next few days to study it, starting by running the official code to see its performance.

Nov 22 '23 12:11 leejet

patiently waiting for SVD to release and being quantized!

Nov 27 '23 17:11 Amin456789

@leejet It seems to have almost the same architecture as SD 2.1 but includes some temporal consistency blocks called "time_stack." We'll need to see how they work and whether new functions need to be added to ggml. The conversion program works with this model; however, please note that we'll need to implement the vision version of CLIP to generate embeddings from images.

Nov 28 '23 03:11 FSSRepo

I'm currently reviewing the SVD implementation code in comfyui. Perhaps I can learn how to conveniently implement SVD within sd.cpp from this.

Nov 28 '23 12:11 leejet

I'm currently reviewing the SVD implementation code in comfyui. Perhaps I can learn how to conveniently implement SVD within sd.cpp from this.

Amazing! Good luck!!, Unfortunately, my time is limited as I am a student. Otherwise, I would be more than happy to help.

Nov 28 '23 12:11 FSSRepo

Bless u guys! SVD in cpp will be a dream! Good luck to all of u!

Nov 28 '23 13:11 Amin456789

@leejet any update and progress on svd and inpainting? really excited to try them out in cpp!

Dec 21 '23 08:12 Amin456789

I've got a basic understanding of the SVD model architecture. Once I merge the https://github.com/leejet/stable-diffusion.cpp/pull/104 and https://github.com/leejet/stable-diffusion.cpp/pull/117, I'll attempt to implement SVD.

Dec 21 '23 13:12 leejet

niceee! so excited, thanks

Dec 21 '23 14:12 Amin456789

Hotshot-XL looks interesting, too and works with SDXL models: https://huggingface.co/hotshotco/Hotshot-XL

Dec 29 '23 23:12 Jonathhhan

@leejet it will be great if you support fp16 of SVD when it is done: https://huggingface.co/becausecurious/stable-video-diffusion-img2vid-fp16/tree/main

they are smaller and probably more ram friendly

Jan 01 '24 15:01 Amin456789

Need as well.

Apr 15 '24 10:04 engineer1109

@leejet any update on svd please?

Jul 15 '24 09:07 Amin456789

I don't know if this is even remotely related to the SD architecture, but it would be could to support the new kid on the block:

https://huggingface.co/genmo/mochi-1-preview

https://huggingface.co/Kijai/Mochi_preview_comfy/tree/main

Oct 24 '24 16:10 mirix

Any updates on SVD?

Dec 09 '24 18:12 patrickjonesdotca

Any updates on SVD?

Dec 10 '24 13:12 Zctoylm0927

There are more img2vid and txt2vid models coming https://github.com/THUDM/CogVideo https://huggingface.co/IamCreateAI/Ruyi-Mini-7B https://github.com/Tencent/HunyuanVideo

Dec 18 '24 15:12 bombless

Good luck for you !!! In my opinion, applying ggml on video(AI) may be nightmare, because it dose not support tensor more than 4d, so conv3d, batchnorm3d etc will make you crazy !!!

Jan 17 '25 16:01 delldu

Good luck for you !!! In my opinion, applying ggml on video(AI) may be nightmare, because it dose not support tensor more than 4d, so conv3d, batchnorm3d etc will make you crazy !!!

I can confirm

Jan 17 '25 17:01 stduhpf