video support
(rewriting sloppy request) I was wondering if video support can be added?
At first I came up with lucidrain's video-diffusion-pytorch https://github.com/lucidrains/video-diffusion-pytorch
But, after some research it seems like zeroscope might be the right model to use https://huggingface.co/cerspense/zeroscope_v2_576w
This model appears to be significantly different from stable-diffusion, no plans to support it currently. If there's time in the future, I will consider providing support for it.
I didn't necessarily mean this specific model, more "video" in general.
I think zeroscope would probably be the right place to start.
Sorry for being sloppy.
https://huggingface.co/cerspense/zeroscope_v2_576w
It looks like this needs some work, and there are no plans to support it currently. Maybe in the future?
stable video diffusion (SVD) models from stability where released!
SVD was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware deflickering decoder
https://stability.ai/news/stable-video-diffusion-open-ai-video-model https://huggingface.co/stabilityai/stable-video-diffusion-img2vid / https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
The SVD demo looks quite good. I'll make time in the next few days to study it, starting by running the official code to see its performance.
patiently waiting for SVD to release and being quantized!
@leejet It seems to have almost the same architecture as SD 2.1 but includes some temporal consistency blocks called "time_stack." We'll need to see how they work and whether new functions need to be added to ggml. The conversion program works with this model; however, please note that we'll need to implement the vision version of CLIP to generate embeddings from images.
I'm currently reviewing the SVD implementation code in comfyui. Perhaps I can learn how to conveniently implement SVD within sd.cpp from this.
I'm currently reviewing the SVD implementation code in comfyui. Perhaps I can learn how to conveniently implement SVD within sd.cpp from this.
Amazing! Good luck!!, Unfortunately, my time is limited as I am a student. Otherwise, I would be more than happy to help.
Bless u guys! SVD in cpp will be a dream! Good luck to all of u!
@leejet any update and progress on svd and inpainting? really excited to try them out in cpp!
I've got a basic understanding of the SVD model architecture. Once I merge the https://github.com/leejet/stable-diffusion.cpp/pull/104 and https://github.com/leejet/stable-diffusion.cpp/pull/117, I'll attempt to implement SVD.
niceee! so excited, thanks
Hotshot-XL looks interesting, too and works with SDXL models: https://huggingface.co/hotshotco/Hotshot-XL
@leejet it will be great if you support fp16 of SVD when it is done: https://huggingface.co/becausecurious/stable-video-diffusion-img2vid-fp16/tree/main
they are smaller and probably more ram friendly
Need as well.
@leejet any update on svd please?
I don't know if this is even remotely related to the SD architecture, but it would be could to support the new kid on the block:
https://huggingface.co/genmo/mochi-1-preview
https://huggingface.co/Kijai/Mochi_preview_comfy/tree/main
Any updates on SVD?
Any updates on SVD?
There are more img2vid and txt2vid models coming https://github.com/THUDM/CogVideo https://huggingface.co/IamCreateAI/Ruyi-Mini-7B https://github.com/Tencent/HunyuanVideo
Good luck for you !!! In my opinion, applying ggml on video(AI) may be nightmare, because it dose not support tensor more than 4d, so conv3d, batchnorm3d etc will make you crazy !!!
Good luck for you !!! In my opinion, applying ggml on video(AI) may be nightmare, because it dose not support tensor more than 4d, so conv3d, batchnorm3d etc will make you crazy !!!
I can confirm