How is KL scaling taken into account -- DenseFlipout

Open cevheck opened this issue 3 years ago • 0 comments

Hello all,

for a paper I'm writing, I'm making use of the DenseFlipout layer. I'm reweighing the KL-loss as follows: kl_divergence_function_output = (lambda q, p, _: tensorflow_probability.distributions.kl_divergence(q, p) / scale) This results in a scaling factor between the KL-loss and the maximal likelihood loss. However in the original paper (https://arxiv.org/pdf/1505.05424.pdf), Equation 2 is comprised of 3 terms. I'm sure about the 2nd and 3rd term wether they are scaled with the scaling factor of the KL loss or not, however I was wondering about the first one and hoped to find my answer here.

Thanks in advance! Cedric

Mar 03 '22 08:03 cevheck