sam
sam copied to clipboard
Is saving the state by calling .state_dict() sufficient?
base_optimizer = torch.optim.Adam
optimizer = SAM(model.parameters(), base_optimizer, lr=0.1)
torch.save({"optz_state_dict":optimizer.state_dict()}, "state.pth")
checkpoint = torch.load("state.pth")
optimizer.load_state_dict(checkpoint["optz_state_dict"])
By using the above code, the saved state size is more than halved compared to just purely using pytorch's own torch.optim.Adam
optimizer. Why such discrepancies, any idea? Resuming them also observed with spike of loss.
The 44mb is using SAM, and the other is Adam:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Pending
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Wondering the same thing.