Bayesian-Neural-Networks
Bayesian-Neural-Networks copied to clipboard
May I ask the code about calculating the Hessian in the Kronecker-Factorised Laplace methods?
def softmax_CE_preact_hessian(last_layer_acts): side = last_layer_acts.shape[1] I = torch.eye(side).type(torch.ByteTensor) # for i != j H = -ai * aj -- Note that these are activations not pre-activations Hl = - last_layer_acts.unsqueeze(1) * last_layer_acts.unsqueeze(2) # for i == j H = ai * (1 - ai) Hl[:, I] = last_layer_acts * (1 - last_layer_acts) return Hl
This function calculates the hessian for the first layer's activations. Why can the hessian be obtained by this process?
for i != j H = -ai * aj -- Note that these are activations not pre-activations
for i == j H = ai * (1 - ai)
Is this process shown in the paper (https://openreview.net/pdf?id=Skdvd2xAZ)?
Looking forward to your reply! Really thank you!
The method you mention computes the Hessian of the softmax-cross entropy layer. You can find the expression in appendix A.2 of this paper: https://arxiv.org/pdf/1905.12558.pdf.