MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

How to organize data, which can be fine-tuned with both image-text data, as well as purely textual data.

Open hill2hill opened this issue 1 year ago • 1 comments

When fine-tuning with LoRA, is it necessary to use data that includes images? If pure text data is used, would it affect the model's performance (it should not, as some open-source datasets for MLM models include SFT with pure text question-answer pairs)?

How should the JSON file be structured?

hill2hill avatar Jun 11 '24 02:06 hill2hill

22 how can i finetune this model with Text-only data and Image-Text data in same dataset?

univa-JASON avatar Jun 11 '24 07:06 univa-JASON

您可以自己调整一下数据集的写法,可以根据条件判断一下

qyc-98 avatar Jul 16 '24 05:07 qyc-98