Bayesian-Neural-Networks icon indicating copy to clipboard operation
Bayesian-Neural-Networks copied to clipboard

May I ask the code about calculating the Hessian in the Kronecker-Factorised Laplace methods?

Open zehuanzhang opened this issue 3 years ago • 1 comments

def softmax_CE_preact_hessian(last_layer_acts): side = last_layer_acts.shape[1] I = torch.eye(side).type(torch.ByteTensor) # for i != j H = -ai * aj -- Note that these are activations not pre-activations Hl = - last_layer_acts.unsqueeze(1) * last_layer_acts.unsqueeze(2) # for i == j H = ai * (1 - ai) Hl[:, I] = last_layer_acts * (1 - last_layer_acts) return Hl

This function calculates the hessian for the first layer's activations. Why can the hessian be obtained by this process?

for i != j H = -ai * aj -- Note that these are activations not pre-activations

for i == j H = ai * (1 - ai)

Is this process shown in the paper (https://openreview.net/pdf?id=Skdvd2xAZ)?

Looking forward to your reply! Really thank you!

zehuanzhang avatar May 23 '22 14:05 zehuanzhang

The method you mention computes the Hessian of the softmax-cross entropy layer. You can find the expression in appendix A.2 of this paper: https://arxiv.org/pdf/1905.12558.pdf.

JavierAntoran avatar May 23 '22 14:05 JavierAntoran