VGen icon indicating copy to clipboard operation
VGen copied to clipboard

Question about the motion adapter in DreamVideo

Open Hugo-cell111 opened this issue 1 year ago • 4 comments

Hi! I find that each time one frame of the guided video is selected to train the motion adapter. But since selecting only one image will break the coherence of a video, I wonder how the motion adapter can capture the temporal motion pattern? Thanks!

Hugo-cell111 avatar May 15 '24 08:05 Hugo-cell111

Hi, thanks for your interest. We train the motion adapter using all frames of input videos. Meanwhile, we select a random frame serving as the appearance guidance.

weilllllls avatar May 16 '24 16:05 weilllllls

Thanks for your response! I also have another few questions: (1) how long does it take for each stage in DreamVideo? I have tried in my own server and found that it takes about 2 hours for just the 1st stage in subject learning. Is it normal? I use 4 V100 PCIE GPUs; (2) Could you provide the link of open_clip_pytorch_model.bin of FrozenOpenCLIPCustomEmbedder?

Hugo-cell111 avatar May 17 '24 11:05 Hugo-cell111

Thanks for your response! I also have another few questions: (1) how long does it take for each stage in DreamVideo? I have tried in my own server and found that it takes about 2 hours for just the 1st stage in subject learning. Is it normal? I use 4 V100 PCIE GPUs; (2) Could you provide the link of open_clip_pytorch_model.bin of FrozenOpenCLIPCustomEmbedder?

Hi. (1) We use one A100 80G GPU. It takes about 50 min for step 1 in subject learning and 10~15 min for step 2. I think your situation is normal due to device differences. By the way, you can reduce the number of training iterations to balance the performance and time costs. (2) The 'open_clip_pytorch_model.bin' used in DreamVideo is the same as the other models (I2VGen-XL, HiGen, TF-T2V, etc.) in this repository. You can download the ckpt from this link: https://modelscope.cn/api/v1/models/iic/tf-t2v/repo?Revision=master&FilePath=open_clip_pytorch_model.bin.

weilllllls avatar May 23 '24 07:05 weilllllls

Thank you very much! By the way, how long does it take to evaluate on all datasets mentioned in the paper of DreamVideo? Could you provide the evaluation code?

Hugo-cell111 avatar May 23 '24 13:05 Hugo-cell111