First frame conditioning possible?
Great work and paper! Is it possible with the current model to inititalize the video with some initial first image and generate the following frames based on this image? If not, what modifications would be needed to achieve that?
Thanks for your inspiration! Our ControlVideo requires motion sequence of all frames (e.g., depth maps) and text prompt to produce a video. In order to achieve first-frame generation, I think that you may add https://github.com/lllyasviel/ControlNet-v1-1-nightly#controlnet-11-shuffle into ControlVideo pipeline, where the content of first-image can be integrated into videos.
Thanks for the pointer! I'll definitely take a look at that. But what if I want the first image to stay completely unchanged and generate the following frames from there based on the motion sequence? (with the first frame of the motion sequence corresponding to the initial image)