CloserLookFewShot doubt regarding 1 shot n way

in the case of image classification,

we have a training set, with x inputs, and the loss is computed by whether our neural network was correctly able to predict the label or not, and accordingly taking the difference between the ground truth and the predicted label.

so, we get a training set loss, and then test this trained model on some test images, and then compute the test set loss.

when using maml, we have epidoses (tasks) within our meta training set, and each of these episodes contain a training set (support set) and a test set (query sample), and similarly episodes within our meta test set contain a training set and a test set.

I am a bit confused what 1 shot 5 way or 5 shot 5 way means, and how is the loss being computed?

does 1 shot 5 way mean that we have 1 image for each of the 5 classes in the training set (support set), and we compute softmax regression probability over these 5 classes and compute training set loss accordingly.

and then give our trained model, a test input from the query sample, and compute the test set loss?

while 5 shot 5 way means

we have 5 images for each of the 5 classes in the training set (support set), so 25 images, and we compute softmax regression probability over these 5 classes and the training set loss for one epoch would be the summation of loss obtained for these 25 images given to our model.

and our test set (query sample) also contains 5 images of the same class, and these 5 images are given to our trained model, and we take summation of loss obtained on these 5 images?

I am confused, can you clarify, what does the loss mean when training maml?

thanks

Apr 13 '19 09:04 vainaixr

Hello, Please keep in mind that the key point of the few-shot meta-learning method is to have a "support set conditioned model". And for MAML, the model is conditioned on the gradient update of the support set.

So, for 5 shot 5 way, yes, the loss of the support set (25 images) would be first calculated, but when the loss is back propagated, its gradient just temporarily updates the weight. This means given different support set, the model would be temporarily updated (or say, adapted) with different gradients, so the model is conditioned on the support set.

Then we calculate the loss of query data (we use 16 images in each class) based on the temporarily updated weight and backpropagate to get the gradient. However, this gradient is calculated with respect to the weight before updates, so the model could learn a better weight to be updated.

1 shot 5 way is the same case but only have 1 data per class (totally 5 data) in support set.

Apr 14 '19 05:04 wyharveychen

Hello, I found a paper related to maml++, is it possible to integrate it into this repository. https://arxiv.org/pdf/1810.09502.pdf They have got a better accuracy for mini imagenet dataset.

Also, is it possible to deploy this model to mobile apps?

Also, is it possible to provide comments for maml.py?

Thanks for your reply.

Apr 14 '19 10:04 vainaixr

Hello, I currently have no plan for further integration or deployment to mobile apps, but feel free to develop your branches from my code. I would provide comments for maml.py recently, thanks for your interests!

Apr 16 '19 01:04 wyharveychen

hello, ok, I will wait for comments.

Apr 19 '19 12:04 vainaixr

Hello, I have added some comments. However, as the MAML code is complicated, I would suggest to also see the repo for better understanding: https://github.com/dragen1860/MAML-Pytorch This is the original repo with Pytorch MAML code.

Apr 22 '19 04:04 wyharveychen

hello, thanks for providing comments.

I had one question, I read a paper, Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

where they proposed a hybrid of prototypical networks and MAML.

from their paper - "We refer to Proto-MAML as the (fo-)MAML model where the task-specific linear layer of each episode is initialized from the Prototypical Network-equivalent weights and bias and subsequently optimized as usual on the given support set. When computing the meta update for θ, we allow gradients to flow through the Prototypical Network-equivalent linear layer initialization."

in your paper it is mentioned that, "Examples of distance metrics include cosine similarity Vinyals et al. (2016), Euclidean distance to class-mean representation Snell et al.(2017), CNN-based relation module Sung et al. (2018), ridge regression Bertinetto et al. (2019), and graph neural network Garcia & Bruna (2018). "

does this mean that other distance metrics learning based methods on combining with MAML, that is a hybrid of these methods with MAML, gives a better performance ?

also is auto augment a better technique for transformation? https://ai.googleblog.com/2018/06/improving-deep-learning-performance.html

May 20 '19 04:05 vainaixr

I don’t think so. As stated in my work, the distance metric based baseline (baseline++) is already competitive with other methods, so I would say distance metric itself is a key point. Thus, even other distance metric method is combined with MAML, I believe the result would be similar to the original distance based method.

On Mon, May 20, 2019 at 12:44 AM vainaijr [email protected] wrote:

hello, thanks for providing comments.

I had one question, I read a paper, Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

where they proposed a hybrid of prototypical networks and MAML.

from their paper - "We refer to Proto-MAML as the (fo-)MAML model where the task-specific linear layer of each episode is initialized from the Prototypical Network-equivalent weights and bias and subsequently optimized as usual on the given support set. When computing the meta update for θ, we allow gradients to flow through the Prototypical Network-equivalent linear layer initialization."

in your paper it is mentioned that, "Examples of distance metrics include cosine similarity Vinyals et al. (2016), Euclidean distance to class-mean representation Snell et al.(2017), CNN-based relation module Sung et al. (2018), ridge regression Bertinetto et al. (2019), and graph neural network Garcia & Bruna (2018). "

does this mean that other distance metrics learning based methods on combining with MAML, that is a hybrid of these methods with MAML, gives a better performance ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wyharveychen/CloserLookFewShot/issues/13?email_source=notifications&email_token=AFPOFKAESHY5M3S5P64BX2DPWIUDJA5CNFSM4HFXZHE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVXVWIY#issuecomment-493837091, or mute the thread https://github.com/notifications/unsubscribe-auth/AFPOFKEAFSAFUPTHQL5EHRLPWIUDJANCNFSM4HFXZHEQ .

May 22 '19 20:05 wyharveychen

where in the code of maml do I need to make changes in order to combine it with other techniques, as it is written that maml is model agnostic, so we can combine other techniques with it.

for example, in this paper, https://arxiv.org/abs/1905.08233v1, they have combined gans with maml.

also, for using TPU while training, we need google colab notebook, I was facing difficulty in converting the existing repo into google colab, and for using TPU only MNIST example is given on xla repository.

on my local machine, with GPU, when I train using maml, the system is not able to handle it, as maml requires more computation.

https://github.com/pytorch/xla/blob/master/contrib/colab/PyTorch_TPU_XRT_1_13.ipynb

also, using ngrok on google colab notebook, with pytorch tensorboard, we can dynamically visualize our images, training accuracy.

in this paper, https://arxiv.org/abs/1905.08233v1, they have generated facial expressions from only one image, and it has been possible by combining maml with gans.

here also, https://github.com/hsukyle/cactus-maml, unsupervised learning via meta learning.

May 30 '19 23:05 vainaixr

Please add any additional layer in the forward function in maml.py. As long as these layers use Conv2d_fw, BatchNorm2d_fw, and Linear_fw in backbone.py, they should be updated as MAML.

I have no experience in TPU so I could not give any related advice. But for the GPU memory issue, you can consider using the first-order approximation version of MAML (--method maml_approx) to save memory.

Jun 01 '19 23:06 wyharveychen

CloserLookFewShot CloserLookFewShot copied to clipboard

doubt regarding 1 shot n way

CloserLookFewShot
CloserLookFewShot copied to clipboard