CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

Finetune img2video based on T2V model

Open Robertwyq opened this issue 1 year ago • 6 comments

Thank you very much for your work. We have attempted to finetune the img2video model on our own dataset, but we found that most of the generated scenes tend to be static. Specifically, when we use driving videos, the output is often a video where the ego-vehicle perspective remains stationary.

Did you encounter a similar issue during your finetuning process, or could this be due to the fact that most of the current models are trained on videos with fixed camera view?

Robertwyq avatar Sep 09 '24 05:09 Robertwyq

@Robertwyq May I ask your fine-tuning method for image to video is the same as described in the paper?

trouble-maker007 avatar Sep 09 '24 06:09 trouble-maker007

image Using condition augmentation may be helpful to generate more dynamic videos.

tengjiayan20 avatar Sep 09 '24 14:09 tengjiayan20

@Robertwyq If you feel like uploading your fine-tuning method in a repo here on GitHub I'm certain you will get tons of good suggestions on how to improve it including how to avoid static behaviour.

Nojahhh avatar Sep 09 '24 14:09 Nojahhh

这里的噪声强度在什么范围会比较合适呢?我用的0.02前半部分会有些动态,后面的偏向于静态

QingQingS avatar Sep 10 '24 06:09 QingQingS

CogVideoX use the same noise level as stable video diffusion. Dynamics won't be a problem

yzy-thu avatar Sep 10 '24 07:09 yzy-thu

@yzy-thu @tengjiayan20 But I found that after training for a while with the same strategy as SVD, the dynamic changes are similar to SVD, both with relatively small amplitudes of change.Should this be a summary of patterns from the training data?

trouble-maker007 avatar Sep 10 '24 10:09 trouble-maker007