AnimateDiff sample of training about Webvid

I randomly selected 500 videos in webvid_2m data for training, and set max_train_steps to 10000. Below, I used the last obtained motion checkpoints sampling results. I did not change other configurations, but the results were very bad. 0-masterpiece,-best-quality,-1girl,-solo,-cherry-blossoms,-hanami,-pink-flower,

Can someone tell me why? Thank you very much. train: torchrun --nnodes=1 --nproc_per_node=3 --master_port 20456 train.py --config configs/training/v1/training.yaml training.yaml.txt

inference: python -m scripts.animate --config /nvme-ssd/yanyi/code/AnimateDiff-main/configs/prompts/my_config.yaml my_config.yaml.txt

Apr 08 '24 02:04 yy-victory

hello, is the unet_checkpoint_path: "/nvme-ssd/yanyi/code/AnimateDiff-main/output_dir/image_finetune/image_finetune-2024-04-07T10-18-11/checkpoints/checkpoint.ckpt" in the training.yaml is trained on webvid_2m? However, in fact, the image finetune stage in the paper should be to train a domain adapter, but the setting in the config is to train a complete unet. Have you found this problem?

Apr 19 '24 07:04 J-Wu97

Thanks for answering my question, the unet_checkpoint_path is trained on webvid_2m_val, the profile I trained this checkpint with is image_finetune .yaml, the reason why the profile here is training.yaml, I want to train video actions on fine-tuned checkpoints. I used the same dataset and training steps in another animateDiff-based project to train and implement image animation, seems to be a problem with xformers == 0.0.16, but pip install xformers > 0.0.16 need to install torch > 2.0, I used another anaconda virtual environment, using xformers == 0.0.22.

Apr 19 '24 09:04 yy-victory

Hello, I met the similar problem. I use the vailation of webvid-2M, about 5k videos, to finetune the motion lora. For 50 steps, the results are similar like you(different imgs change quickly). For 2.5k steps, the results tend to be stable, and no difference between frames. I think we use small dataset, or there are some configurations I ignored. I also hope a solution for this problem.

Jun 12 '24 11:06 aulaywang

您好，我遇到了类似的问题。我使用 webvid-2M（大约 5k 视频）的 vailation 来微调运动 lora。对于 50 个步骤，结果与您相似（不同的 imgs 变化很快）。对于 2.5k 步长，结果趋于稳定，帧之间没有差异。我认为我们使用小数据集，或者我忽略了一些配置。我也希望这个问题能得到解决。 Thank you for your reply. Increasing the number of training steps can indeed improve the results.

Jun 13 '24 06:06 yy-victory

sample of training about Webvid_2M