probability
probability copied to clipboard
How is KL scaling taken into account -- DenseFlipout
Hello all,
for a paper I'm writing, I'm making use of the DenseFlipout layer. I'm reweighing the KL-loss as follows: kl_divergence_function_output = (lambda q, p, _: tensorflow_probability.distributions.kl_divergence(q, p) / scale) This results in a scaling factor between the KL-loss and the maximal likelihood loss. However in the original paper (https://arxiv.org/pdf/1505.05424.pdf), Equation 2 is comprised of 3 terms. I'm sure about the 2nd and 3rd term wether they are scaled with the scaling factor of the KL loss or not, however I was wondering about the first one and hoped to find my answer here.
Thanks in advance! Cedric