HashNet icon indicating copy to clipboard operation
HashNet copied to clipboard

False sampling of data

Open hbellafkir opened this issue 5 years ago • 1 comments

hi,

I just found out, that all images in the query list are also in the database list, which is not allowed for fair validation.

thanks

hbellafkir avatar Sep 01 '20 10:09 hbellafkir

-- Second this.

prefix = 'D:/Downloads/HashNet-master/HashNet-master/pytorch/data/'
for dataset in ['imagenet', 'coco', 'nuswide_81']:
    with open(prefix + f'{dataset}/train.txt', 'r') as f:
        train = set(f.read().splitlines())
    with open(prefix + f'{dataset}/test.txt', 'r') as f:
        test = set(f.read().splitlines())
    with open(prefix + f'{dataset}/database.txt', 'r') as f:
        database = set(f.read().splitlines())
    print(dataset, len(train.intersection(database)))
    print(dataset, len(test.intersection(database)))
    print(dataset, len(test.intersection(train)))
imagenet 13000
imagenet 0
imagenet 0
coco 0
coco 5000
coco 0
nuswide_81 10000
nuswide_81 0
nuswide_81 0

During test time we use test.txt as query and database.txt as retrieval. They should not intersect which is wrong for COCO.

vinnik-dmitry07 avatar Apr 13 '23 22:04 vinnik-dmitry07