AnimateDiff icon indicating copy to clipboard operation
AnimateDiff copied to clipboard

sample of training about Webvid_2M

Open yy-victory opened this issue 1 year ago • 4 comments

I randomly selected 500 videos in webvid_2m data for training, and set max_train_steps to 10000. Below, I used the last obtained motion checkpoints sampling results. I did not change other configurations, but the results were very bad. 0-masterpiece,-best-quality,-1girl,-solo,-cherry-blossoms,-hanami,-pink-flower,

Can someone tell me why? Thank you very much. train: torchrun --nnodes=1 --nproc_per_node=3 --master_port 20456 train.py --config configs/training/v1/training.yaml training.yaml.txt

inference: python -m scripts.animate --config /nvme-ssd/yanyi/code/AnimateDiff-main/configs/prompts/my_config.yaml my_config.yaml.txt

yy-victory avatar Apr 08 '24 02:04 yy-victory

hello, is the unet_checkpoint_path: "/nvme-ssd/yanyi/code/AnimateDiff-main/output_dir/image_finetune/image_finetune-2024-04-07T10-18-11/checkpoints/checkpoint.ckpt" in the training.yaml is trained on webvid_2m? However, in fact, the image finetune stage in the paper should be to train a domain adapter, but the setting in the config is to train a complete unet. Have you found this problem?

J-Wu97 avatar Apr 19 '24 07:04 J-Wu97

Thanks for answering my question, the unet_checkpoint_path is trained on webvid_2m_val, the profile I trained this checkpint with is image_finetune .yaml, the reason why the profile here is training.yaml, I want to train video actions on fine-tuned checkpoints. I used the same dataset and training steps in another animateDiff-based project to train and implement image animation, seems to be a problem with xformers == 0.0.16, but pip install xformers > 0.0.16 need to install torch > 2.0, I used another anaconda virtual environment, using xformers == 0.0.22.

yy-victory avatar Apr 19 '24 09:04 yy-victory

Hello, I met the similar problem. I use the vailation of webvid-2M, about 5k videos, to finetune the motion lora. For 50 steps, the results are similar like you(different imgs change quickly). For 2.5k steps, the results tend to be stable, and no difference between frames. I think we use small dataset, or there are some configurations I ignored. I also hope a solution for this problem.

aulaywang avatar Jun 12 '24 11:06 aulaywang

您好,我遇到了类似的问题。我使用 webvid-2M(大约 5k 视频)的 vailation 来微调运动 lora。对于 50 个步骤,结果与您相似(不同的 imgs 变化很快)。对于 2.5k 步长,结果趋于稳定,帧之间没有差异。我认为我们使用小数据集,或者我忽略了一些配置。我也希望这个问题能得到解决。 Thank you for your reply. Increasing the number of training steps can indeed improve the results.

yy-victory avatar Jun 13 '24 06:06 yy-victory