tran/test split json file for MSR-VTT caption task reproduce
Thank you for your wonderful project!
Could you provide the train/test split JSON files for the MSR-VTT caption dataset? I am unable to access the following files:
• datasets/annotations_all/msrvtt_caption/train.jsonl
• datasets/annotations_all/msrvtt_caption/test.jsonl
From my understanding, you used 1k samples for the test set. To accurately reproduce the results from the paper, could you please provide the sample IDs used for the test set?
Yes, me too. While trying to reproduce the results. I couldnt find the files mentioned by @naajeehxe plus the following file: 'datasets/annotations_all/msvd_caption/train.jsonl' It would be great if you could let us know how to generate the same.
@idj3tboy I’m not sure if this will be helpful, but I’d like to share how I did it. I downloaded the data from (https://cove.thecvf.com/datasets/839) and used the following two txt files for the train/test split:
• MSRVTT/videos/train_list_new.txt
• MSRVTT/videos/test_list_new.txt
As a result, I got 7010 train data and 2990 test data. I’m not exactly sure what the 9k/1k train/test data used in the paper refers to, but I was able to reproduce results similar to the paper using this 7k/3k train/test split.
If you’re in a hurry, it might be a good idea to give it a try!
I don't know if this can help you, but I found these train/test splits with 9k/1k, as written in the paper.
https://github.com/albanie/collaborative-experts/blob/master/misc/datasets/msrvtt/README.md
The 1k-A split was produced by the authors of JSFusion [4]. The train/val splits are listed in the files: train_list_jsfusion.txt (9000 videos) and val_list_jsfusion.txt (1000 videos)