FewTURE icon indicating copy to clipboard operation
FewTURE copied to clipboard

Uable to reproduce the performance reported in the paper.

Open caoql98 opened this issue 2 years ago • 17 comments

Thanks for your outstanding work! I have utilized the pre-trained weights and the configs you mentioned in the README (--epochs 100,--optim_steps_online 5) to Meta Fine-tuning and Evaluate FewTure. However, I only obtain Test Acc 66.8844 +- 0.8728 for 1shot 5way and Test Acc 81.9867 +- 0.5830 for 5shot 5way in miniImagenet. Do you have utilized other tricks? Thanks for your reply!

caoql98 avatar Feb 22 '23 05:02 caoql98

Hi @caoql98, I'll take a look whether there's been a mixup with the uploaded checkpoint, since the results you're getting seem quite a bit off. --> I assume you've tried the ViT one?

mrkshllr avatar Feb 22 '23 08:02 mrkshllr

Additional question: What are you obtaining regarding validation acc when fine-tuning?

mrkshllr avatar Feb 22 '23 08:02 mrkshllr

Thanks for your suggestions, I have tried the VIT-small on the miniImagenet, the validation ACC for 1shot 5way is 72.69+-0.9330 and for 5shot 5way is 84.27 +- 0.5744

caoql98 avatar Feb 22 '23 08:02 caoql98

By the way, by transferring --optim_steps_online from 5 to 15,there is an improvement, I could obtain Test Acc 67.3822 + 0.8750 for 1shot 5way in miniImagenet. Maybe you provide the wrong hyperparameters?

caoql98 avatar Feb 22 '23 08:02 caoql98

Additionally, according to the train_metatrain_FewTURE.py, the --epochs in the readme file need to rewrite as --num_epochs. The --chkpt_epoch 1599 should be --chkpt_epoch 1600 since the provided weights are xxxx1600.pth.

caoql98 avatar Feb 22 '23 08:02 caoql98

By the way, by transferring --optim_steps_online from 5 to 15,there is an improvement, I could obtain Test Acc 67.3822 + 0.8750 for 1shot 5way in miniImagenet. Maybe you provide the wrong hyperparameters?

The results in the paper are obtained with 15 optim steps, and you might even be able to achieve higher accuracies if those are increased (total performance wasn't our primary goal, but rather the method ;) ) -> So that it helps in your experiments is already a good sign

mrkshllr avatar Feb 22 '23 08:02 mrkshllr

Additionally, according to the train_metatrain_FewTURE.py, the --epochs in the readme file need to rewrite as --num_epochs. The --chkpt_epoch 1599 should be --chkpt_epoch 1600 since the provided weights are xxxx1600.pth.

Please excuse the seemingly unclear way how we stated the instructions in the readme -> The given command wasn't meant to reproduce the exact result of the paper, but rather to show how our code is used (for the provided case with 5 steps) -> The checkpoint epoch name stems from the actually trained checkpoints you'll obtain when using our self-supervised pretraining script where we start counting our epochs from '0', i.e. 1599 will be 1600 epochs. I've simply renamed the uploaded checkpoints to '1600' to better reflect that the model has indeed been trained for 1600 epochs, and you are correct that the loading command has to be adapted accordingly.

mrkshllr avatar Feb 22 '23 08:02 mrkshllr

Have you had a chance to repeat the finetuning for 5-way 5-shot with 15 or 20 steps?

mrkshllr avatar Feb 22 '23 08:02 mrkshllr

Yes, I am attempting to do that, once I obtain the result, I will tell you! Thanks for your patient reply!

caoql98 avatar Feb 22 '23 08:02 caoql98

No worries! -> I've updated the Readme instructions regarding the meta-finetuning, thanks for pointing these out! (especially the 'epochs' vs 'num_epochs' argument)

mrkshllr avatar Feb 22 '23 08:02 mrkshllr

by transferring --optim_steps_online from 20 and 25, I could obtain test Acc 67.88+-0.9698 and test Acc 67.8978 + 0.8755for 1shot 5way in miniImagenet. Meanwhile, under 15 and 20 settings. for 5-way 5-shot, the model respectively obtains Test Acc 82.4933 +- 0.5769 Test Acc 82.6489 +-0.5704. So I think 15 is still not an appropriate option for the model. Moreover, the performance for the 5-way 5-shot still has an obvious gap. Could you help me figure this out?

caoql98 avatar Feb 23 '23 07:02 caoql98

I'll take a closer look tomorrow and see if I can find the logs; In the meantime, you could try lowering the similarity temp slightly: similarity_temp_init = 0.0421 Or activate the meta-learning of it

mrkshllr avatar Feb 23 '23 11:02 mrkshllr

Thanks for your advice. I would further try lowering the similarity temp slightly: similarity_temp_init = 0.0421. Yesterday, I have also tried the swin-tiny with --optim_steps_online 20 in miniImagenet. However, I only obtain Test Acc 70.6867 + 0.8323 for 5way 1shot setting and 85.0622 + 0.5439 for 5way 5shot setting. There is still an obvious gap. By the way, what do you mean activating the meta-learning of it? Do you mean lowering the similarity temp slightly and activating meta-learning during training and testing?

caoql98 avatar Feb 25 '23 05:02 caoql98

Hi @caoql98, I've run 2 more meta-finetuning runs with ViT and 15 optimisation steps on miniImageNet -- one using the default temperature, and one with the lowered one I've mentioned earlier; I also varied the epochs of meta-ft a bit; => I noticed one small bug in the code that must have occurred when finalising it for github: The Tmax of the lr_scheduler is epoch independent, however should be dependent (which basically affects the learning rate for the cosine schedule): T_max=50 * args.num_episodes_per_epoch -> T_max=args.epochs * args.num_episodes_per_epoch

I ran the following settings to provide some insight, and got these results:

  • default temp: Val acc 85.17 -- Test acc: 83.9289+-0.5475 (meta-ft with 100epochs)
  • lowered temp: Val acc 85.31 -- Test acc: 84.1800+-0.5344 (meta-ft with 20 epochs)

In the paper, we report 84.05 for the 15-step setting, so both are within the error-margin and the slightly lower temp even slightly improved upon that -- your results should thus end up somewhere around those values as well;

P.S.: Please re-download the ViT checkpoint, I've uploaded the local checkpoint I've been using for these runs! vit_checkpoint

I'll update the readme and code as soon as I can; Let me know if you have any further issues/queries, also feel free to drop me an email for a more in-depth discussion/analysis in case problems persist!

(Regarding your question of meta-learning the temp: You can learn the scaling temperature for the inner-loop steps within the outer loop, and our code supports that -> Some analysis is provided in the supplementary material of the paper)

mrkshllr avatar Feb 28 '23 09:02 mrkshllr

Thanks for your reply! I could obtain very similar results for your settings. Particularly, default temp: Test acc: 83.7378 + 0.5591 (meta-ft with 100epochs) 83.9978 + 0.5322 (20 optimization steps). However, in your paper, you report performance is 84.51 ±0.53 for 5-shot in Imagenet. Would it be able to reproduce? By the way, you provide another checkpoint for ViT. Do we need other new checkpoints (for instance, for Swin) to produce the same results?

caoql98 avatar Mar 06 '23 03:03 caoql98

Hi, @mrkshllr, after fixing the bug, I have leveraged the provided pertained weights to run two more experiments with Swin in mininImagenet and I got val acc: 73.88+-0.8627, test acc: 70.1711 + -0.8495 for 1-shot setting, val acc: 85.82+-0.5170, test acc: Test Acc 84.9178 +- 0.5394 for 5-shot setting. Clearly, there is still a huge performance gap. Are there any other bugs? or Do we need to self-produce the Self-Supervised Pretrained Models?

caoql98 avatar Mar 07 '23 03:03 caoql98

I also can't achieve same results for swin-tiny in miniImagenet. As for me, I can get test acc 70.77 for 5-way 1-shot setting. And I train the self-supervised pretrained model but get no imporve. What should we do?

WuJi1 avatar Jul 16 '23 11:07 WuJi1