generative-models
generative-models copied to clipboard
[Stable Video Diffusion] first frame is not equal to initial image
I use the image to make video, the first frame in video sequence differs a little from my initial image. Is it how video diffusion should work?
Example:
Initial image
First video frame
I think the minor difference may come from the loss of autoencoder.
@Little-Podi thank you for you answer, I also think that autoencoder loss is the reason, I wanted to continue video from the last generated frame, but this leads to degradation
Yes! I also think it will be amazing if its generation quality can be extended to much longer sequences. BTW, I guess the temporal-aware deflickering decoder may also affect the identity of the first frame, since it will align the first frame to the subsequent frames. Changing back to the original independent image decoder could help the first frame identical to the given frame.
@KyriaAnnwyn and @Little-Podi do you have any updates on this? I switched from the Temporal VAE to the Frame-wise VAE and I still cannot generate a video sequence where the first frame exactly matches the conditioning image that I provide as input to the pipeline? Thanks
Yes, I have also tried the image decoder, but it doesn't help. The temporal-aware decoder can greatly eliminate the jittering but doesn't affect the content. To enable identical preservation, it is necessary to do some fine-tuning with the UNet part.