handson-ml2 [QUESTION] KL divergence formula for the regularizer layer needs explication

[QUESTION] KL divergence formula for the regularizer layer needs explication

Open hansglick opened this issue 3 years ago • 2 comments

Hi @ageron ,

cell n°44 @ https://github.com/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb, you build a KLDivergence Layer, but the formula you use is little difficult to understand at least for me,

Why

kl_divergence(self.target, mean_activities) +
kl_divergence(1. - self.target, 1. - mean_activities)

and not simply kl_divergence(self.target, mean_activities) ?

Apr 26 '22 21:04 hansglick

Hi @hansglick ,

That's a great question, thanks!

The KL divergence equation computes the divergence between two probability distributions (see my video on this topic). For example, if the probability of activation is 0.4 but we actually want it to be 0.1 (for sparsity), then the correct equation is:

>>> import numpy as np
>>> 0.1 * np.log(0.1 / 0.4) + (1 - 0.1) * np.log((1 - 0.1) / (1 - 0.4))
0.22628916118535888

This includes the probability of activation (0.4) and the probability of no-activation (1-0.4), since we need a full probability distribution.

Or we can use the kullback_leibler_divergence() function from the tensorflow.keras.losses package to get the same result as a tensor:

>>> from tensorflow.keras.losses import kullback_leibler_divergence
>>> kullback_leibler_divergence([0.1, 1-0.1], [0.4, 1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>

Another way to get the same result is to call the kullback_leibler_divergence twice, once with just the probability of activation, and once with just the probability of no-activation:

>>> kullback_leibler_divergence([0.1], [0.4]) + kullback_leibler_divergence([1-0.1], [1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>

This last option was less verbose than the previous option, since it does not require concatenating the probabilities into a single tensor.

I think I'll add a note in the notebook about this, I agree it's not intuitive. Thanks again!

May 05 '22 06:05 ageron

@ageron Thank you Sir for your great explanation and your time.

May 13 '22 09:05 hansglick

handson-ml2 handson-ml2 copied to clipboard

[QUESTION] KL divergence formula for the regularizer layer needs explication

handson-ml2
handson-ml2 copied to clipboard