12M finetune data for v1.2 plus
The finetune data for InternVL-Chat-v1.2 used 1.2M open-source data. Could you please specify what the 12M finetune data for v1.2 plus consists of?
Check here: https://github.com/OpenGVLab/InternVL/blob/main/BLOG.md#data-preparation
@htian01 i think this is the finetuning data for internvl-chat-v1.2, not for v1.2 plus.
We are no longer using this data because we found that having more data is not always better; the quality of the data is more important. We discovered that reducing the data from 12M to 5M significantly improved the model's performance. You can refer to the data we used in InternVL 1.5. Below is the list of data reported in our technical report: