Open-Sora data process for pre-training and fine-tuning

data process for pre-training and fine-tuning

Open liuheng0111 opened this issue 1 year ago • 1 comments

Here you said prepare a 10M dataset. What is it composed of, panda-10m and HD-VG-130M? How much of the HD-VG dataset has been used? The pre-training has 9.7M videos. Does this mean that the processing pipeline only filtered out 3% of the videos? What processing steps were involved in the pre-training, and what processing steps were involved in the fine-tuning? What filtering thresholds were used for each?

May 11 '24 08:05 liuheng0111

数据处理跟训练同时进行吗？为什么不提前进行离线预处理数据呢？

May 16 '24 03:05 handsomeZhuang

This issue is stale because it has been open for 7 days with no activity.

May 24 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

May 31 '24 01:05 github-actions[bot]

Open-Sora Open-Sora copied to clipboard

data process for pre-training and fine-tuning

Open-Sora
Open-Sora copied to clipboard