meta-learning-lstm-pytorch
meta-learning-lstm-pytorch copied to clipboard
pytorch implementation of Optimization as a Model for Few-shot Learning
Optimization as a Model for Few-shot Learning
Pytorch implementation of Optimization as a Model for Few-shot Learning in ICLR 2017 (Oral)

Prerequisites
Data
Preparation
- Make sure Mini-Imagenet is split properly. For example:
- data/ - miniImagenet/ - train/ - n01532829/ - n0153282900000005.jpg - ... - n01558993/ - ... - val/ - n01855672/ - ... - test/ - ... - main.py - ...- It'd be set if you download and extract Mini-Imagenet from the link above
- Check out
scripts/train_5s_5c.sh, make sure--data-rootis properly set
Run
For 5-shot, 5-class training, run
bash scripts/train_5s_5c.sh
Hyper-parameters are referred to the author's repo.
For 5-shot, 5-class evaluation, run (remember to change --resume and --seed arguments)
bash scripts/eval_5s_5c.sh
Notes
- Results (This repo is developed following the pytorch reproducibility guideline):
| seed | train episodes | val episodes | val acc mean | val acc std | test episodes | test acc mean | test acc std |
|---|---|---|---|---|---|---|---|
| 719 | 41000 | 100 | 59.08 | 9.9 | 100 | 56.59 | 8.4 |
| - | - | - | - | - | 250 | 57.85 | 8.6 |
| - | - | - | - | - | 600 | 57.76 | 8.6 |
| 53 | 44000 | 100 | 58.04 | 9.1 | 100 | 57.85 | 7.7 |
| - | - | - | - | - | 250 | 57.83 | 8.3 |
| - | - | - | - | - | 600 | 58.14 | 8.5 |
- The results I get from directly running the author's repo can be found here, I have slightly better performance (~5%) but neither results match the number in the paper (60%) (Discussion and help are welcome!).
- Training with the default settings takes ~2.5 hours on a single Titan Xp while occupying ~2GB GPU memory.
- The implementation replicates two learners similar to the author's repo:
learner_w_gradfunctions as a regular model, get gradients and loss as inputs to meta learner.learner_wo_gradconstructs the graph for meta learner:- All the parameters in
learner_wo_gradare replaced bycIoutput by meta learner. nn.Parametersin this model are casted totorch.Tensorto connect the graph to meta learner.
- All the parameters in
- Several ways to copy a parameters from meta learner to learner depends on the scenario:
copy_flat_params: we only need the parameter values and keep the originalgrad_fn.transfer_params: we want the values as well as thegrad_fn(fromcItolearner_wo_grad)..data.copy_v.s.clone()-> the latter retains all the properties of a tensor includinggrad_fn.- To maintain the batch statistics,
load_state_dictis used (fromlearner_w_gradtolearner_wo_grad).
References
- CloserLookFewShot (Data loader)
- pytorch-meta-optimizer (Casting
nn.Parameterstotorch.Tensorinspired from here) - meta-learning-lstm (Author's repo in Lua Torch)