InternVL How to organize my customized dataset for finetuning InternVL2?

The document & guide for finetuning InternVL2 are generally clear. However, I overcome a problem with my dataset. As I read from the document & code, a dataset suitable for Internvl_chat finetuning might contain multiple "conversations" as shown below (Since its all single-turn in my case).

{
  "id": "XX",
  "image": "/path/to/image.jpg",
  "conversations": [{
      "from": "human",
      "value": "text1"
    },
    {
      "from": "gpt",
      "value": "text2"
    }]
}

My questions are:

Is 'system' role supported in this format?
In my case, I've multiple images that might be used at one time (all send by 'human'). How should I process my data? My data might look like:

{
  "id": "XX",
  "conversations": [{
      "from": "human",
      "value": "text <image1> some text <image2> some other text"
    },
    {
      "from": "gpt",
      "value": "text2"
    }]
}

Thank you!

Jul 30 '24 13:07 zzjchen

Will simply putting all images in "image" work for my case?

Jul 30 '24 13:07 zzjchen

Hi, see this document to prepare training data: https://internvl.readthedocs.io/en/latest/get_started/chat_data_format.html

Aug 09 '24 05:08 czczup

Thanks, I've already solved my issue. Sorry not closing the issue in time

Aug 09 '24 06:08 zzjchen