DeepSeek-VL dataset format of pretraining stage

dataset format of pretraining stage

Open annopackage opened this issue 1 year ago • 0 comments

trafficstars

How did you unify the format of pretraining dataset? During supervised fine tuning stage, the training data are curated as question and answer pairs. For caption or detection dataset, I want to know if they follow the same format as sft data, and how to collect questions for these data as they originally only contains ground truth like caption or boxes?

Jul 17 '24 05:07 annopackage

DeepSeek-VL DeepSeek-VL copied to clipboard

dataset format of pretraining stage

DeepSeek-VL
DeepSeek-VL copied to clipboard