Meta-SGD-pytorch
Meta-SGD-pytorch copied to clipboard
Reproduce normal MAML
Thanks for your great work! I manually set task_lr to 0.01 in an attempt to reproduce normal MAML results. However, only 35% accuracy was achieved on 5way5shot miniImagenet task. How can I change it to reproduce the result of maml?
Hi, thanks for your interest in my work! I think it is due to not enough number of gradient updates in inner loop. In my Meta-SGD implementation, there is only one inner loop gradient update since the model learns how much learning rate should be for each parameter. However in MAML, for example of 5way5shot miniImagenet task, you need to update learner's parameters 5 times in inner loop gradient update stage. And also to make reproduce results as close as possible to ones in the original MAML paper, you need to carefully set hyper-parameters such as task_lr since MAML is shown to be sensitive to its hyper-parameters.
I just made the following changes:
# NOTE if we want approx-MAML, change create_graph=True to False
zero_grad(model.parameters())
grads = torch.autograd.grad(loss, model.parameters(), create_graph=True)
# performs updates using calculated gradients
# we manually compute adpated parameters since optimizer.step() operates in-place
adapted_state_dict = model.cloned_state_dict()
adapted_params = OrderedDict()
for (key, val), grad in zip(model.named_parameters(), grads):
# NOTE Here Meta-SGD is different from naive MAML
# Also we only need single update of inner gradient update
#task_lr = model.task_lr[key]
adapted_params[key] = val - 0.1 * grad
adapted_state_dict[key] = adapted_params[key]
if params.num_inner_updates > 1:
for _ in range(1, params.num_inner_updates):
Y_sup_hat = model(X_sup, adapted_state_dict)
loss = loss_fn(Y_sup_hat, Y_sup)
zero_grad(adapted_params.values())
grads = torch.autograd.grad(loss, adapted_params.values(), create_graph=True)
for (key, val), grad in zip(adapted_params.items(), grads):
#task_lr = model.task_lr[key]
adapted_params[key] = val - 0.1 * grad
adapted_state_dict[key] = adapted_params[key]
return adapted_state_dict
Because there is not enough CUDA memory, I tested the following parameters.: num_inner_updates 2 for create_graph=True num_inner_updates 5 for create_graph=False learning rate 0.1, 0.01 ,0.001 But the accuracy rate is maintained at 20% - 35% Is my code wrong?
The update of internal parameters is done through a loop, does this mean that meta-sgd is not practical for larger models?
Hi, have you encountered such situation that after feeding X_sup and adapted_state_dict into the model, adapted_state_dict changed automatically?
when I implemented a meta learning code my state_dict differs before and after feed into forward function and I'm not sure why.