probability
probability copied to clipboard
Understanding the determinant Jacobian of SoftmaxCentered
Hi,
I have been tinkering with the SoftmaxCentered
bijector but I would like help understanding how the determinant of the Jacobian is calculated. I've already found the function SoftmaxCentered._inverse_log_det_jacobian, which is very nicely documented. From what I understand from the comments in that function, the SoftmaxCentered transformation $x = B(z)$ is composed of two transformation $B(\pmb{z}) = G[F(\pmb{z})]$.
I think I understand how the first transformation: $\pmb{z} = F^{-1}(\pmb{y}) = [ \ln y_1 - \ln y_0, \dots, \ln y_n - \ln y_0]$ with $y_0 = 1 - \sum_{i=1}^n y_i$ gives rise to
$$ |\det \frac{\partial F(\pmb{z})}{\partial \pmb{z}} | = \prod_{i=0}^n y_i $$
(using the Sherman-Morrison formula and the Matrix determinant lemma). I have trouble understanding what the second transformation $\pmb{x} = G(\pmb{y})$ is good for, and why its Jacobian is det{(DG)^T(DG)}
so as to give rise to an additional constant factor $\sqrt{n+1}$ in the determinant of $B(z)$.
As an example, lets assume I use the SoftmaxCentered to transform uniform samples.
$$z_1 \sim \mathrm{Uniform}(0, 1), \quad [x_0, x_1] = B([z_1]). $$
The SoftmaxCentered.forward_log_det_jacobian
should keep my distribution $p(\pmb{x}) = p(z_1) |\det \frac{\partial z}{\partial B} |$ normalised.
However, when I numerically compute
$$1 = \int \mathrm{d}\pmb{x} p(\pmb{x}) = \int_0^1 \mathrm{d}x_0 \int_0^1 \mathrm{d}x_1 p(\pmb{x}) \delta(x_1 + x_0 -1) = \int_0^1 \mathrm{d}x_1 p([1 - x_1, x_1]) $$
I get exactly a factor $1/\sqrt{2}$ off, which I think is related to the function $G(\pmb{y})$.
import numpy as np
from tensorflow_probability.substrates import jax as tfp
tfb = tfp.bijectors
softmax = tfb.SoftmaxCentered()
ln_p_z = 0.0 # Uniform distribution.
z = np.linspace(0, 1, 1000).reshape(-1, 1)
x = softmax(z)
ln_p_x = ln_p_z - softmax.forward_log_det_jacobian(z)
p_x = np.exp(ln_p_x)
# Approximate integration with Riemann sum.
dx = np.diff(x[:, 0], axis=0)
normalisation = np.sum(p_x[1:] * dx)
# normalisation is 1/sqrt[2] instead of one.
I am probably overlooking something trivial. Can you help me fill in the gaps?
Thanks in advance,
Hylke