12M finetune data for v1.2 plus

Open fyting opened this issue 1 year ago • 2 comments

The finetune data for InternVL-Chat-v1.2 used 1.2M open-source data. Could you please specify what the 12M finetune data for v1.2 plus consists of?

Apr 30 '24 10:04 fyting

Check here: https://github.com/OpenGVLab/InternVL/blob/main/BLOG.md#data-preparation

May 03 '24 15:05 htian01

@htian01 i think this is the finetuning data for internvl-chat-v1.2, not for v1.2 plus.

May 05 '24 14:05 fyting

We are no longer using this data because we found that having more data is not always better; the quality of the data is more important. We discovered that reducing the data from 12M to 5M significantly improved the model's performance. You can refer to the data we used in InternVL 1.5. Below is the list of data reported in our technical report:

May 30 '24 14:05 czczup