Finetune img2video based on T2V model
Thank you very much for your work. We have attempted to finetune the img2video model on our own dataset, but we found that most of the generated scenes tend to be static. Specifically, when we use driving videos, the output is often a video where the ego-vehicle perspective remains stationary.
Did you encounter a similar issue during your finetuning process, or could this be due to the fact that most of the current models are trained on videos with fixed camera view?
@Robertwyq May I ask your fine-tuning method for image to video is the same as described in the paper?
Using condition augmentation may be helpful to generate more dynamic videos.
@Robertwyq If you feel like uploading your fine-tuning method in a repo here on GitHub I'm certain you will get tons of good suggestions on how to improve it including how to avoid static behaviour.
这里的噪声强度在什么范围会比较合适呢?我用的0.02前半部分会有些动态,后面的偏向于静态
CogVideoX use the same noise level as stable video diffusion. Dynamics won't be a problem
@yzy-thu @tengjiayan20 But I found that after training for a while with the same strategy as SVD, the dynamic changes are similar to SVD, both with relatively small amplitudes of change.Should this be a summary of patterns from the training data?