mPLUG-2 tran/test split json file for MSR-VTT caption task reproduce

Thank you for your wonderful project!

Could you provide the train/test split JSON files for the MSR-VTT caption dataset? I am unable to access the following files:

•	datasets/annotations_all/msrvtt_caption/train.jsonl
•	datasets/annotations_all/msrvtt_caption/test.jsonl

Sep 11 '24 09:09 naajeehxe

From my understanding, you used 1k samples for the test set. To accurately reproduce the results from the paper, could you please provide the sample IDs used for the test set?

Sep 11 '24 10:09 naajeehxe

Yes, me too. While trying to reproduce the results. I couldnt find the files mentioned by @naajeehxe plus the following file: 'datasets/annotations_all/msvd_caption/train.jsonl' It would be great if you could let us know how to generate the same.

Sep 13 '24 11:09 idj3tboy

@idj3tboy I’m not sure if this will be helpful, but I’d like to share how I did it. I downloaded the data from (https://cove.thecvf.com/datasets/839) and used the following two txt files for the train/test split:

•	MSRVTT/videos/train_list_new.txt
•	MSRVTT/videos/test_list_new.txt

As a result, I got 7010 train data and 2990 test data. I’m not exactly sure what the 9k/1k train/test data used in the paper refers to, but I was able to reproduce results similar to the paper using this 7k/3k train/test split.

If you’re in a hurry, it might be a good idea to give it a try!

Sep 16 '24 08:09 naajeehxe

I don't know if this can help you, but I found these train/test splits with 9k/1k, as written in the paper.

https://github.com/albanie/collaborative-experts/blob/master/misc/datasets/msrvtt/README.md

The 1k-A split was produced by the authors of JSFusion [4]. The train/val splits are listed in the files: train_list_jsfusion.txt (9000 videos) and val_list_jsfusion.txt (1000 videos)

Feb 02 '25 14:02 thanhhff