stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

video support

Open ehartford opened this issue 2 years ago • 23 comments

(rewriting sloppy request) I was wondering if video support can be added?

At first I came up with lucidrain's video-diffusion-pytorch https://github.com/lucidrains/video-diffusion-pytorch

But, after some research it seems like zeroscope might be the right model to use https://huggingface.co/cerspense/zeroscope_v2_576w

ehartford avatar Aug 19 '23 21:08 ehartford

This model appears to be significantly different from stable-diffusion, no plans to support it currently. If there's time in the future, I will consider providing support for it.

leejet avatar Aug 20 '23 04:08 leejet

I didn't necessarily mean this specific model, more "video" in general.

I think zeroscope would probably be the right place to start.

Sorry for being sloppy.

https://huggingface.co/cerspense/zeroscope_v2_576w

ehartford avatar Aug 20 '23 11:08 ehartford

It looks like this needs some work, and there are no plans to support it currently. Maybe in the future?

leejet avatar Aug 21 '23 00:08 leejet

stable video diffusion (SVD) models from stability where released!

SVD was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware deflickering decoder

https://stability.ai/news/stable-video-diffusion-open-ai-video-model https://huggingface.co/stabilityai/stable-video-diffusion-img2vid / https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt

Green-Sky avatar Nov 21 '23 20:11 Green-Sky

The SVD demo looks quite good. I'll make time in the next few days to study it, starting by running the official code to see its performance.

leejet avatar Nov 22 '23 12:11 leejet

patiently waiting for SVD to release and being quantized!

Amin456789 avatar Nov 27 '23 17:11 Amin456789

@leejet It seems to have almost the same architecture as SD 2.1 but includes some temporal consistency blocks called "time_stack." We'll need to see how they work and whether new functions need to be added to ggml. The conversion program works with this model; however, please note that we'll need to implement the vision version of CLIP to generate embeddings from images.

FSSRepo avatar Nov 28 '23 03:11 FSSRepo

I'm currently reviewing the SVD implementation code in comfyui. Perhaps I can learn how to conveniently implement SVD within sd.cpp from this.

leejet avatar Nov 28 '23 12:11 leejet

I'm currently reviewing the SVD implementation code in comfyui. Perhaps I can learn how to conveniently implement SVD within sd.cpp from this.

Amazing! Good luck!!, Unfortunately, my time is limited as I am a student. Otherwise, I would be more than happy to help.

FSSRepo avatar Nov 28 '23 12:11 FSSRepo

Bless u guys! SVD in cpp will be a dream! Good luck to all of u!

Amin456789 avatar Nov 28 '23 13:11 Amin456789

@leejet any update and progress on svd and inpainting? really excited to try them out in cpp!

Amin456789 avatar Dec 21 '23 08:12 Amin456789

I've got a basic understanding of the SVD model architecture. Once I merge the https://github.com/leejet/stable-diffusion.cpp/pull/104 and https://github.com/leejet/stable-diffusion.cpp/pull/117, I'll attempt to implement SVD.

leejet avatar Dec 21 '23 13:12 leejet

niceee! so excited, thanks

Amin456789 avatar Dec 21 '23 14:12 Amin456789

Hotshot-XL looks interesting, too and works with SDXL models: https://huggingface.co/hotshotco/Hotshot-XL

Jonathhhan avatar Dec 29 '23 23:12 Jonathhhan

@leejet it will be great if you support fp16 of SVD when it is done: https://huggingface.co/becausecurious/stable-video-diffusion-img2vid-fp16/tree/main

they are smaller and probably more ram friendly

Amin456789 avatar Jan 01 '24 15:01 Amin456789

Need as well.

engineer1109 avatar Apr 15 '24 10:04 engineer1109

@leejet any update on svd please?

Amin456789 avatar Jul 15 '24 09:07 Amin456789

I don't know if this is even remotely related to the SD architecture, but it would be could to support the new kid on the block:

https://huggingface.co/genmo/mochi-1-preview

https://huggingface.co/Kijai/Mochi_preview_comfy/tree/main

mirix avatar Oct 24 '24 16:10 mirix

Any updates on SVD?

patrickjonesdotca avatar Dec 09 '24 18:12 patrickjonesdotca

Any updates on SVD?

Zctoylm0927 avatar Dec 10 '24 13:12 Zctoylm0927

There are more img2vid and txt2vid models coming https://github.com/THUDM/CogVideo https://huggingface.co/IamCreateAI/Ruyi-Mini-7B https://github.com/Tencent/HunyuanVideo

bombless avatar Dec 18 '24 15:12 bombless

Good luck for you !!! In my opinion, applying ggml on video(AI) may be nightmare, because it dose not support tensor more than 4d, so conv3d, batchnorm3d etc will make you crazy !!!

delldu avatar Jan 17 '25 16:01 delldu

Good luck for you !!! In my opinion, applying ggml on video(AI) may be nightmare, because it dose not support tensor more than 4d, so conv3d, batchnorm3d etc will make you crazy !!!

I can confirm

stduhpf avatar Jan 17 '25 17:01 stduhpf