Open-Sora
Open-Sora copied to clipboard
data process for pre-training and fine-tuning
Here you said prepare a 10M dataset. What is it composed of, panda-10m and HD-VG-130M? How much of the HD-VG dataset has been used? The pre-training has 9.7M videos. Does this mean that the processing pipeline only filtered out 3% of the videos? What processing steps were involved in the pre-training, and what processing steps were involved in the fine-tuning? What filtering thresholds were used for each?
数据处理跟训练同时进行吗?为什么不提前进行离线预处理数据呢?
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.