Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

data process for pre-training and fine-tuning

Open liuheng0111 opened this issue 1 year ago • 1 comments

image

Here you said prepare a 10M dataset. What is it composed of, panda-10m and HD-VG-130M? How much of the HD-VG dataset has been used? The pre-training has 9.7M videos. Does this mean that the processing pipeline only filtered out 3% of the videos? What processing steps were involved in the pre-training, and what processing steps were involved in the fine-tuning? What filtering thresholds were used for each?

liuheng0111 avatar May 11 '24 08:05 liuheng0111

image数据处理跟训练同时进行吗?为什么不提前进行离线预处理数据呢?

handsomeZhuang avatar May 16 '24 03:05 handsomeZhuang

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar May 24 '24 01:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar May 31 '24 01:05 github-actions[bot]