pytorch_RVAE
pytorch_RVAE copied to clipboard
How to create MSCOCO dataset
Hi, I got the MSCOCO captions_train2014.json and captions_val2014.json, as described in the paper, there are 82,783 train samples and 40,504 val samples, every sample contains 5 captions. If I omit one caption and combine the other four into two paraphrase pairs, there will be about 2*(82,783 + 40,504)=246,574 pairs. How can i get the 320k paraphrase pairs?
The author replies me how to create the dataset as follows: Each data has multiple captions. Say a,b and c are paraphrases of each other then to make it into a pair you can do the following pairing: a -> b b -> a a -> c c -> a b -> c c -> b.
This will mean a lot more data-points than the total number of image-caption pair. However, make sure that all the phrases that are part of a single image remain either in train or in val.