pmf_cvpr22 icon indicating copy to clipboard operation
pmf_cvpr22 copied to clipboard

What hyperparameters should be changed for effective training on Meta-Dataset subset?

Open ajkailash opened this issue 1 year ago • 2 comments

Hi, I have a pre-trained backbone network (self-supervised) which I would want to deploy for few-shot classification. However, I can’t use your meta training pipeline because the Meta-Dataset consist of six research-only datasets like ImagenNet-1k, FGVC-Aircraft, Describable textures, etc.

Now, If I reduce the subset of the Meta-Dataset to only contain commercially usable datasets (QuickDraw, Omniglot, VGG Flower, and Traffic Signs) and add two more-Oxford pets and imagenette to it. What hyperparameters (learning rate, # of epochs, # of episodes, LR scheduler, weight decay, optimizer, etc.) should I change to ensure meta training works effectively?

ajkailash avatar Jun 26 '23 07:06 ajkailash

Hi @ajkailash,

I would recommend to deploy your SSL model directly on the target tasks without meta-training (but fine-tuning on the support set would be necessary to achieve good performance, see our PEFT paper if your backbone is too large for few-shot full model fine tuning). If the performance is suboptimal, you may come back again on the meta-training phase (a default hyper-parameter for standard meta-dataset should be ok).

hushell avatar Jun 26 '23 08:06 hushell

Hi, Thanks for quick reply. My backbone SSL model is vit-tiny (~5.7 million parameter model).

The following are the meta-training args that I used. num_epochs: 100 image_size: 128 Num ways: (5-50) max_num_query: 10 max_support_set_size: 500 max_support_size_contrib_per_class: 100 num_episodes: 2000 Deployment: “Vanilla” Transforms: [‘random_resized_crop', 'jitter', 'random_flip', 'to_tensor', 'normalize'] Base_sources: [‘vgg_flower', 'omniglot', 'quickdraw', 'oxford_pets', 'imagenette', 'traffic_sign'] (num_classes: 1283)

Apart from the Base_sources, I believe these are the default args.

Post meta-training, I evaluated my model on the mini-imagenet test dataset. Top1 and top5 accuracy were both lower when compared to running the SSL model directly. I also visualized the embeddings generated for the model(post meta-training) for the aircraft dataset (Manufacturer hierarchy) using t-SNE plots, there were no visible clusters.

ajkailash avatar Jun 30 '23 09:06 ajkailash