sd-webui-text2video icon indicating copy to clipboard operation
sd-webui-text2video copied to clipboard

[Feature request] Add video2video mode (with in-painting and outpainting analogues for making vid from keyframes and AI-continuing vids)

Open kabachuha opened this issue 2 years ago • 8 comments

Just like stable diffusion is transforming one picture into another one (or noise, if the input is not specified), this model is theoretically capable of transforming a video into another video, using text hints if we initialize the latents with the input video frames

https://github.com/deforum-art/sd-webui-modelscope-text2video/blob/857594d61ea776794296ffa6d256bf93eaa7fcd2/scripts/t2v_pipeline.py#L153


The proposed scheme (like img2img, but to videos)

  • [x] Prepare input videos for the input mode (rescaling, cutting to the input length)
  • [x] Encode videos to the latent representation by running the VAE
  • [x] Configure the DDIM scheduler to use Denoising strength
  • [x] Pass the latents to the pipeline and test it
  • [x] Configure denoising strength influence
  • [x] Bonus: In-framing —pass an input video, add a few keyframes, mask them, fill the rest with latent noise or keep the original, vid2vid diffuse not masked area. Just like in-painting, but for vid2vid
  • [x] Bonus 2: Video continuation — extending the vid with latent noise frames and moving a 'window' making the aforementioned 'in-framing', thus allowing the video to extend beyond vram bounds, but losing some of its temporal coherence

kabachuha avatar Mar 20 '23 00:03 kabachuha

Does adding this line of code allow us to go from video to video?

Pythonpa avatar Mar 21 '23 07:03 Pythonpa

This line of code points only to that we can replace latents with the video to process encoded via VAE. So it will require one more extra step

kabachuha avatar Mar 21 '23 12:03 kabachuha

PR adding denoising strength

https://github.com/deforum-art/sd-webui-modelscope-text2video/pull/34

kabachuha avatar Mar 21 '23 23:03 kabachuha

WIP

https://github.com/deforum-art/sd-webui-modelscope-text2video/pull/37

If anyone knows how to fix it so the results don't look all washed out, that would be super helpful.

nagolinc avatar Mar 22 '23 07:03 nagolinc

Can we get a Controlnet Pose video 2 video ?

Basically analyzing the frames poses of the character with the open pose model, saving these Controlnet pictures, loading them back with the controlnet plugin on and the openpose model off, rendering the rest of the vid 2 vid

the controlnet frames check needs to be done sequentially, to limit the Vram usage

Kamekos avatar Mar 24 '23 08:03 Kamekos

@Apatiste sounds like a good idea for the future Deforum/text2vid integration, since Deforum already has ControlNet support

kabachuha avatar Mar 24 '23 10:03 kabachuha

I have inpainting working!

Github is acting up atm but I will push a branch as soon as I can.

inpainting_Screenshot 2023-03-27 091243

00128-395674870-cute redhead standing in a  field of wheat, freckles, realistic skin texture

https://user-images.githubusercontent.com/7775917/227947785-8d1cbc9a-005d-46ba-8e01-3d2c0831314c.mp4

nagolinc avatar Mar 27 '23 13:03 nagolinc

https://github.com/deforum-art/sd-webui-modelscope-text2video/pull/74

nagolinc avatar Mar 27 '23 16:03 nagolinc