LLaVA-NeXT Number of image and image placeholder inconsistent of the M4-Instruct-Data

Number of image and image placeholder inconsistent of the M4-Instruct-Data

Open laserwave opened this issue 7 months ago • 3 comments

Hi,

I find the number of image and image placeholder inconsistent in some instances of the M4-Instruct-Data.

For example, there are two image placeholders and 4 image paths, which is not consistent. Shoule I just use the first two images?

{"sample_id": 4756, "conversations": [{"from": "human", "value": "<image><image>\nWhat's the detailed difference between the 2 images? Please list in detail."}, {"from": "gpt", "value": "The differences between the images are as follows:\n\n1. In the second image, the person is wearing a sleeveless top, unlike the short-sleeved top in the first image.\n2. In the third image, the surfboard held by the person is red on the bottom, whereas in the previous images it was not red.\n3. In the fourth image, there is a shark fin visible in the water to the right of the person, which wasn't present in the earlier images."}], "image": ["MagicBrush/images/10955.jpg", "MagicBrush/images/10956.jpg", "MagicBrush/images/10957.jpg", "MagicBrush/images/10958.jpg"], "choice_list": null, "metadata": {"dataset": "MagicBrush-Diff", "split": "train", "num_sample": 4511, "task_instruction": "What's the difference between 2 images?", "question_type": "open-ended"}}

By the way, the content is from https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data/blob/main/m4_instruct_annotations.json updated on 2024/7/11.

Jul 19 '24 07:07 laserwave

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Number of image and image placeholder inconsistent of the M4-Instruct-Data

LLaVA-NeXT
LLaVA-NeXT copied to clipboard