InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

12M finetune data for v1.2 plus

Open fyting opened this issue 1 year ago • 2 comments

The finetune data for InternVL-Chat-v1.2 used 1.2M open-source data. Could you please specify what the 12M finetune data for v1.2 plus consists of?

fyting avatar Apr 30 '24 10:04 fyting

Check here: https://github.com/OpenGVLab/InternVL/blob/main/BLOG.md#data-preparation

htian01 avatar May 03 '24 15:05 htian01

@htian01 i think this is the finetuning data for internvl-chat-v1.2, not for v1.2 plus.

fyting avatar May 05 '24 14:05 fyting

We are no longer using this data because we found that having more data is not always better; the quality of the data is more important. We discovered that reducing the data from 12M to 5M significantly improved the model's performance. You can refer to the data we used in InternVL 1.5. Below is the list of data reported in our technical report:

image

czczup avatar May 30 '24 14:05 czczup