Contrastive-Clustering icon indicating copy to clipboard operation
Contrastive-Clustering copied to clipboard

New dataset format

Open Aditya-shahh opened this issue 3 years ago • 7 comments

Hello,

I'd like to train this model on new image datasets. so what should be the file structure for the input dataset? As it is an unsupervised classification, so will it still be the same as Imagefolder format that we generally use i.e

/data -> train -> class_1 -> img1.png -> img2.png . . -> class_2 ->val -> class_1 -> img1.png -> img2.png . -> class_2

Can you please shed some light on this?

Also, if we are experimenting with new data, should we use just training data or train and test data both?

Aditya-shahh avatar Apr 21 '21 04:04 Aditya-shahh

Hi,

For the file structure of the dataset, I recommend you using the DatasetFoler class provided by PyTorch following the instructions in https://pytorch.org/vision/stable/datasets.html#datasetfolder. And yes, ImageFolder would be a nice choice for image data. It will automatically assign labels based on the subfolders. The assigned labels should not be used for training but evaluation.

The choice of data depends on your needs. If you simply want to cluster your own images, you may use all of the data. But if you want to see the generalization ability of our method, you may use the training data for training and test data for evaluation.

Hope this answers your question.

Yunfan-Li avatar Apr 21 '21 11:04 Yunfan-Li

hello,

I'd like to compare with this paper,but i find a question.The batch size in your paper is 256, but the configs file is 128, so which is right? I'm a Chinese, hhhh。 Thanks.

TryHard-LL avatar Apr 29 '21 08:04 TryHard-LL

Hi, thanks for pointing out that. All the experiments in the paper are conducted with a batch size of 256. I have corrected the mistake in the config file, hhhh.

Yunfan-Li avatar Apr 29 '21 09:04 Yunfan-Li

hi,

I'd like to use the best results, it can be used for testing,but I want to use it for pretraining, i can't load it.Can you give me some suggests? Thanks.

TryHard-LL avatar Apr 29 '21 13:04 TryHard-LL

You may use the same reload function "model.load_state_dict()". Please copy the error message here if it doesn’t work.

Yunfan-Li avatar Apr 29 '21 13:04 Yunfan-Li

This is the error if i reload the chekpoint to pretrain.

Traceback (most recent call last): File "D:/Myselfs/Codes/Pycharm/CC/train.py", line 129, in optimizer.load_state_dict(checkpoint['optimizer'])
KeyError: 'optimizer'

TryHard-LL avatar Apr 29 '21 13:04 TryHard-LL

thank you, it worked

TryHard-LL avatar Apr 29 '21 14:04 TryHard-LL