cnnimageretrieval-pytorch icon indicating copy to clipboard operation
cnnimageretrieval-pytorch copied to clipboard

how train on my own dataset?

Open shayxurui opened this issue 6 years ago • 5 comments

thanks for sharing code. i wonder if i can train on my own dataset, and how should i prepare the pkl file?

shayxurui avatar Dec 21 '18 02:12 shayxurui

The database pkl file is a dictionary, containing 'train' and 'val' parts, one used for training, the other for validation, respectively. Each one of them is a dictionary that should contain the following:

'cids': list of images with names given as content_id (cid). See cid2filename to understand how is cid used to generate image filename.

'cluster': a cluster (3D model) id given for each image, used to for hard negative mining.

'qidxs' and 'pidxs': query-positive image pairs given as indices pointing to a respective position in 'cids'.

filipradenovic avatar Jan 14 '19 13:01 filipradenovic

@shayxurui have you solve it?

no, i give up

shayxurui avatar Mar 21 '19 10:03 shayxurui

So we can’t train this on our dataset... imagine i have plenty images an want tontrain this model to recognize them : i can’t?

saiaman avatar Apr 23 '19 06:04 saiaman

So we can’t train this on our dataset... imagine i have plenty images an want tontrain this model to recognize them : i can’t?

What exactly is not clear in my response from earlier?

Each image needs to have a label: cluster, model, 3D model, type of object etc. In fact label can be anything, as long as all the images having the same label should be embedded close in the image representation space. Then, if you have that, selecting query-positive pairs is trivial, and selection of negative images is done on the fly, while training.

filipradenovic avatar Apr 26 '19 14:04 filipradenovic

The database pkl file is a dictionary, containing 'train' and 'val' parts, one used for training, the other for validation, respectively. Each one of them is a dictionary that should contain the following:

'cids': list of images with names given as content_id (cid). See cid2filename to understand how is cid used to generate image filename.

'cluster': a cluster (3D model) id given for each image, used to for hard negative mining.

'qidxs' and 'pidxs': query-positive image pairs given as indices pointing to a respective position in 'cids'.

hi, I have a question about this cluster, I want to know the clustering is generate by the GPS distance given image or the feature similarity between the images

ionLi avatar Mar 20 '24 15:03 ionLi