handson-ml2
handson-ml2 copied to clipboard
[QUESTION] KL divergence formula for the regularizer layer needs explication
Hi @ageron ,
cell n°44 @ https://github.com/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb, you build a KLDivergence Layer, but the formula you use is little difficult to understand at least for me,
Why
kl_divergence(self.target, mean_activities) +
kl_divergence(1. - self.target, 1. - mean_activities)
?
and not simply
kl_divergence(self.target, mean_activities)
?
Hi @hansglick ,
That's a great question, thanks!
The KL divergence equation computes the divergence between two probability distributions (see my video on this topic). For example, if the probability of activation is 0.4 but we actually want it to be 0.1 (for sparsity), then the correct equation is:
>>> import numpy as np
>>> 0.1 * np.log(0.1 / 0.4) + (1 - 0.1) * np.log((1 - 0.1) / (1 - 0.4))
0.22628916118535888
This includes the probability of activation (0.4) and the probability of no-activation (1-0.4), since we need a full probability distribution.
Or we can use the kullback_leibler_divergence() function from the tensorflow.keras.losses package to get the same result as a tensor:
>>> from tensorflow.keras.losses import kullback_leibler_divergence
>>> kullback_leibler_divergence([0.1, 1-0.1], [0.4, 1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>
Another way to get the same result is to call the kullback_leibler_divergence twice, once with just the probability of activation, and once with just the probability of no-activation:
>>> kullback_leibler_divergence([0.1], [0.4]) + kullback_leibler_divergence([1-0.1], [1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>
This last option was less verbose than the previous option, since it does not require concatenating the probabilities into a single tensor.
I think I'll add a note in the notebook about this, I agree it's not intuitive. Thanks again!
@ageron Thank you Sir for your great explanation and your time.