Discrepancy in Image ID Alignment Between M3IT and VideoChat2IT
Could you please provide a script or JSON file of the ID map from M3IT to VideoChat2IT? Matching different files can be quite challenging. For example, coco llava minigpt4 paragraph_captioning textcaps (VideoChat2IT/caption) v.s. coco coco-cn flickr8k-cn image_paragraph_captioning msrvtt textcap (M3IT/captioning). In addition, the image IDs do not completely match; for instance, COCO images in VideoChat2IT have an additional directory compared to those in M3IT. I believe it would be beneficial to fully opensource this.
Hi! You can change these datasets by yourself from M3IT, since we use the original annotations but change the file_name for our data.
You mean manually check the file for each split? That's fine, but solely changing file names is confusing and adds unnecessary workload without any benefits.
Hi, I didn't find image/caption/minigpt4 from M3IT, how can I obtain these images?
https://drive.google.com/file/d/1nJXhoEcy3KTExr17I7BXqY5Y9Lx_-n-9/view