Ask-Anything Discrepancy in Image ID Alignment Between M3IT and VideoChat2IT

Could you please provide a script or JSON file of the ID map from M3IT to VideoChat2IT? Matching different files can be quite challenging. For example, coco llava minigpt4 paragraph_captioning textcaps (VideoChat2IT/caption) v.s. coco coco-cn flickr8k-cn image_paragraph_captioning msrvtt textcap (M3IT/captioning). In addition, the image IDs do not completely match; for instance, COCO images in VideoChat2IT have an additional directory compared to those in M3IT. I believe it would be beneficial to fully opensource this.

Jun 26 '24 07:06 patrick-tssn

Hi! You can change these datasets by yourself from M3IT, since we use the original annotations but change the file_name for our data.

Jun 26 '24 15:06 Andy1621

You mean manually check the file for each split? That's fine, but solely changing file names is confusing and adds unnecessary workload without any benefits.

Jun 27 '24 01:06 patrick-tssn

Hi, I didn't find image/caption/minigpt4 from M3IT, how can I obtain these images?

Jul 16 '24 09:07 patrick-tssn

https://drive.google.com/file/d/1nJXhoEcy3KTExr17I7BXqY5Y9Lx_-n-9/view

Jul 16 '24 11:07 yinanhe