ImageReward icon indicating copy to clipboard operation
ImageReward copied to clipboard

Dataset download issue

Open bhattg opened this issue 1 year ago • 2 comments

Hi,

I am trying to reproduce the 1k and 4k numbers for the ImageReward function accuracy, as mentioned in the paper. To do so, I downloaded the data, and modified it slightly so that it could be loaded using the script make_dataset.py. However, there are some file IDs in the training set, that have null images, that is 0K file size.

Following is the list of the IDs.

005050-0024
005389-0008
005795-0038
006272-0041
006756-0071
005165-0028
005332-0172
005356-0019
006011-0030
006167-0087
006758-0099
005179-0097
005444-0063
005434-0068
005459-0003
005344-0055
006174-0048
006190-0114
006214-0021
006787-0015
006857-0073
006830-0003

bhattg avatar Sep 07 '23 06:09 bhattg

Thanks for pointing this out, we do have a small number of image files in our dataset that don't exist, we'll be fixing this in the next version, you can skip these invalid images for now.

xujz18 avatar Sep 10 '23 15:09 xujz18

Hi,

I am trying to reproduce the 1k and 4k numbers for the ImageReward function accuracy, as mentioned in the paper. To do so, I downloaded the data, and modified it slightly so that it could be loaded using the script make_dataset.py. However, there are some file IDs in the training set, that have null images, that is 0K file size.

Following is the list of the IDs.

005050-0024
005389-0008
005795-0038
006272-0041
006756-0071
005165-0028
005332-0172
005356-0019
006011-0030
006167-0087
006758-0099
005179-0097
005444-0063
005434-0068
005459-0003
005344-0055
006174-0048
006190-0114
006214-0021
006787-0015
006857-0073
006830-0003

Hello, I am also working on reproducing the training results, but I found the 'train.json' file in huggingface seems cannot be directly used for make_dataset.py. Could you share the processed train.json file? many thanks!

muse1998 avatar Nov 01 '23 11:11 muse1998