Meta-SGD-pytorch Reproduce normal MAML

Thanks for your great work! I manually set task_lr to 0.01 in an attempt to reproduce normal MAML results. However, only 35% accuracy was achieved on 5way5shot miniImagenet task. How can I change it to reproduce the result of maml?

Oct 10 '19 05:10 dataaug

Hi, thanks for your interest in my work! I think it is due to not enough number of gradient updates in inner loop. In my Meta-SGD implementation, there is only one inner loop gradient update since the model learns how much learning rate should be for each parameter. However in MAML, for example of 5way5shot miniImagenet task, you need to update learner's parameters 5 times in inner loop gradient update stage. And also to make reproduce results as close as possible to ones in the original MAML paper, you need to carefully set hyper-parameters such as task_lr since MAML is shown to be sensitive to its hyper-parameters.

Oct 10 '19 06:10 jik0730

I just made the following changes：

   # NOTE if we want approx-MAML, change create_graph=True to False
    zero_grad(model.parameters())
    grads = torch.autograd.grad(loss, model.parameters(), create_graph=True)

    # performs updates using calculated gradients
    # we manually compute adpated parameters since optimizer.step() operates in-place
    adapted_state_dict = model.cloned_state_dict()
    adapted_params = OrderedDict()
    for (key, val), grad in zip(model.named_parameters(), grads):
        # NOTE Here Meta-SGD is different from naive MAML
        # Also we only need single update of inner gradient update
        #task_lr = model.task_lr[key]
        adapted_params[key] = val - 0.1 * grad
        adapted_state_dict[key] = adapted_params[key]

    if params.num_inner_updates > 1:
        for _ in range(1, params.num_inner_updates):
            Y_sup_hat = model(X_sup, adapted_state_dict)
            loss = loss_fn(Y_sup_hat, Y_sup)

            zero_grad(adapted_params.values())
            grads = torch.autograd.grad(loss, adapted_params.values(), create_graph=True)

            for (key, val), grad in zip(adapted_params.items(), grads):
                #task_lr = model.task_lr[key]
                adapted_params[key] = val - 0.1 * grad
                adapted_state_dict[key] = adapted_params[key]
    return adapted_state_dict

Because there is not enough CUDA memory, I tested the following parameters.： num_inner_updates 2 for create_graph=True num_inner_updates 5 for create_graph=False learning rate 0.1， 0.01 ，0.001 But the accuracy rate is maintained at 20% - 35% Is my code wrong?

Oct 10 '19 14:10 dataaug

The update of internal parameters is done through a loop, does this mean that meta-sgd is not practical for larger models？

May 26 '20 09:05 soongjian

Hi, have you encountered such situation that after feeding X_sup and adapted_state_dict into the model, adapted_state_dict changed automatically?
when I implemented a meta learning code my state_dict differs before and after feed into forward function and I'm not sure why.

Nov 03 '21 07:11 finesssss

Meta-SGD-pytorch Meta-SGD-pytorch copied to clipboard

Reproduce normal MAML

Meta-SGD-pytorch
Meta-SGD-pytorch copied to clipboard