lightly Self-Supervised Learning understanding

Dear, I would like to understand how the lightly works. In particular, I use MoCo example and I got very low classification accuracy. After that I try to use a dataset with a folder for negative examples but the results is not god. I would like to understand the MoCo training if I can use the resnet50 from torchvision pretrained on Imagenet to reach best performance. The last question is it is possible to use the same definition of the network with pretrained weights on ImageNet with others Self-Supervised algorithms like SimSiam or BYOL or other?

Jun 02 '21 09:06 marcomameli1992

It is possible to have the BYOL example?

Jun 02 '21 09:06 marcomameli1992

Hi @marcomameli1992, Thanks for trying out lightly on your dataset. Could you elaborate more about the data you used? Maybe the learned features were not that helpful for the downstream classification task.

You can use pretrained models from ImageNet. However, I'm not sure how well the models work. We plan to do some benchmarks on that. We also didn't do any benchmarks of training a model with one approach (e.g. MoCo) and then training the model with another one (e.g. BYOL). I'm also not aware of any research work in these directions.

From my experiments, the different training methods (MoCo, SimCLR, BYOL, ...) result in slightly different weight distribution. I also think that this distribution differs from supervised models trained on ImageNet.

Jun 02 '21 12:06 IgorSusmelj

Thank you for your answare. My dataset came from instagram hashtag scraping based on bag images that contains at minimum one bag inside. We have 2000 images so I think the problem is the number of images that it is not enought. Do you think that can be the problem?

Jun 02 '21 12:06 marcomameli1992

@marcomameli1992 There could be many problems apart from just the number of images. There could be a lapse in the distribution of images in the separate classes and many other such issues.

I think using pre-trained weights for self supervised learning is a bit counter-intuitive here. The purpose of getting the visual features learnt will be lost if the model is pre-trained. This also bring me back to your latest question. The lesser number of images could also prohibit the model from learning all the visual features.

A colleagues of mine and I trained the moco model on different datasets and realised the number of images really does make a difference. Have you tried augmenting the images? (I am not sure how exactly to determine the threshold of how many images would actually be suitable in your case).

Aug 18 '21 07:08 aymuos15

lightly lightly copied to clipboard

Self-Supervised Learning understanding

lightly
lightly copied to clipboard