implicit-hyper-opt icon indicating copy to clipboard operation
implicit-hyper-opt copied to clipboard

neumann_hyperstep_preconditioner

Open killandy opened this issue 4 years ago • 2 comments

Hi, I'm confused by the function neumann_hyperstep_preconditioner since I found two versions of it.

One in the rnn/train.py computes the hessian_term as :

hessian_term = (counter.view(1, -1) @ d_train_loss_d_w.view(-1, 1) @ d_train_loss_d_w.view(1, -1)).view(-1)

And the other one in train_augment_net2.py is:

hessian_term = gather_flat_grad( grad(d_train_loss_d_w, model.parameters(), grad_outputs=counter.view(-1), retain_graph=True))

What's the difference between them and which one should I choose?

Thanks

killandy avatar Jan 06 '21 07:01 killandy

@killandy did you find an answer to this?

Cinofix avatar Jan 21 '21 10:01 Cinofix

The first-term resembles Gauss-Newton Hessian with square loss for regression problems. It can be seen at "empirical Fisher" but it may not recommended in practice (see this paper).

As far as I know, the second term seems to be more correct as described in the paper.

anh-tong avatar Jun 08 '21 12:06 anh-tong