generative-models [Stable Video Diffusion] first frame is not equal to initial image

[Stable Video Diffusion] first frame is not equal to initial image

Open KyriaAnnwyn opened this issue 1 year ago • 5 comments

I use the image to make video, the first frame in video sequence differs a little from my initial image. Is it how video diffusion should work? Example: Initial image 00-024 First video frame 01-000

Dec 13 '23 10:12 KyriaAnnwyn

I think the minor difference may come from the loss of autoencoder.

Dec 19 '23 05:12 Little-Podi

@Little-Podi thank you for you answer, I also think that autoencoder loss is the reason, I wanted to continue video from the last generated frame, but this leads to degradation

Dec 19 '23 06:12 KyriaAnnwyn

Yes! I also think it will be amazing if its generation quality can be extended to much longer sequences. BTW, I guess the temporal-aware deflickering decoder may also affect the identity of the first frame, since it will align the first frame to the subsequent frames. Changing back to the original independent image decoder could help the first frame identical to the given frame.

Dec 20 '23 05:12 Little-Podi

@KyriaAnnwyn and @Little-Podi do you have any updates on this? I switched from the Temporal VAE to the Frame-wise VAE and I still cannot generate a video sequence where the first frame exactly matches the conditioning image that I provide as input to the pipeline? Thanks

Jan 17 '24 15:01 mlfarinha

Yes, I have also tried the image decoder, but it doesn't help. The temporal-aware decoder can greatly eliminate the jittering but doesn't affect the content. To enable identical preservation, it is necessary to do some fine-tuning with the UNet part.

Jan 18 '24 07:01 Little-Podi

generative-models generative-models copied to clipboard

[Stable Video Diffusion] first frame is not equal to initial image

generative-models
generative-models copied to clipboard