implicit-hyper-opt
implicit-hyper-opt copied to clipboard
neumann_hyperstep_preconditioner
Hi, I'm confused by the function neumann_hyperstep_preconditioner since I found two versions of it.
One in the rnn/train.py computes the hessian_term as :
hessian_term = (counter.view(1, -1) @ d_train_loss_d_w.view(-1, 1) @ d_train_loss_d_w.view(1, -1)).view(-1)
And the other one in train_augment_net2.py is:
hessian_term = gather_flat_grad( grad(d_train_loss_d_w, model.parameters(), grad_outputs=counter.view(-1), retain_graph=True))
What's the difference between them and which one should I choose?
Thanks
@killandy did you find an answer to this?
The first-term resembles Gauss-Newton Hessian with square loss for regression problems. It can be seen at "empirical Fisher" but it may not recommended in practice (see this paper).
As far as I know, the second term seems to be more correct as described in the paper.