CogVideo How many frames (seconds) are there in each video sample used in the training process?

How many frames (seconds) are there in each video sample used in the training process?

Open BinZhu-ece opened this issue 2 years ago • 1 comments

How many frames (seconds) are there in each video sample used in the training process? Is it the same as the output sample of the 4-second clip of 32 frames? What‘s the video length in the dataset used for your training? Did you directly use the complete video or slice the video?

Sep 06 '22 12:09 BinZhu-ece

The video sample used in the training process is of multiple frame rates, including 1, 2, 4, 8 fps. Due to the limitation of GPU memory and the large scale of CogVideo model, each model can process 5 frames at the same time. The video lengths in our dataset range from 1 sec to over 30 sec. We use the complete video as far as possible to maintain the alignment between video and text in the training set, but may slice the video when it is very long.

Sep 10 '22 16:09 wenyihong

CogVideo CogVideo copied to clipboard

How many frames (seconds) are there in each video sample used in the training process?

CogVideo
CogVideo copied to clipboard