probability icon indicating copy to clipboard operation
probability copied to clipboard

Understanding the determinant Jacobian of SoftmaxCentered

Open hylkedonker opened this issue 1 year ago • 0 comments

Hi,

I have been tinkering with the SoftmaxCentered bijector but I would like help understanding how the determinant of the Jacobian is calculated. I've already found the function SoftmaxCentered._inverse_log_det_jacobian, which is very nicely documented. From what I understand from the comments in that function, the SoftmaxCentered transformation $x = B(z)$ is composed of two transformation $B(\pmb{z}) = G[F(\pmb{z})]$. I think I understand how the first transformation: $\pmb{z} = F^{-1}(\pmb{y}) = [ \ln y_1 - \ln y_0, \dots, \ln y_n - \ln y_0]$ with $y_0 = 1 - \sum_{i=1}^n y_i$ gives rise to

$$ |\det \frac{\partial F(\pmb{z})}{\partial \pmb{z}} | = \prod_{i=0}^n y_i $$

(using the Sherman-Morrison formula and the Matrix determinant lemma). I have trouble understanding what the second transformation $\pmb{x} = G(\pmb{y})$ is good for, and why its Jacobian is det{(DG)^T(DG)} so as to give rise to an additional constant factor $\sqrt{n+1}$ in the determinant of $B(z)$.

As an example, lets assume I use the SoftmaxCentered to transform uniform samples. $$z_1 \sim \mathrm{Uniform}(0, 1), \quad [x_0, x_1] = B([z_1]). $$ The SoftmaxCentered.forward_log_det_jacobian should keep my distribution $p(\pmb{x}) = p(z_1) |\det \frac{\partial z}{\partial B} |$ normalised. However, when I numerically compute $$1 = \int \mathrm{d}\pmb{x} p(\pmb{x}) = \int_0^1 \mathrm{d}x_0 \int_0^1 \mathrm{d}x_1 p(\pmb{x}) \delta(x_1 + x_0 -1) = \int_0^1 \mathrm{d}x_1 p([1 - x_1, x_1]) $$ I get exactly a factor $1/\sqrt{2}$ off, which I think is related to the function $G(\pmb{y})$.

import numpy as np
from tensorflow_probability.substrates import jax as tfp
tfb = tfp.bijectors

softmax = tfb.SoftmaxCentered()
ln_p_z = 0.0  # Uniform distribution.
z = np.linspace(0, 1, 1000).reshape(-1, 1)
x = softmax(z)
ln_p_x = ln_p_z - softmax.forward_log_det_jacobian(z)
p_x = np.exp(ln_p_x)

# Approximate integration with Riemann sum.
dx = np.diff(x[:, 0], axis=0)
normalisation = np.sum(p_x[1:] * dx)
# normalisation is 1/sqrt[2] instead of one.

I am probably overlooking something trivial. Can you help me fill in the gaps?

Thanks in advance,

Hylke

hylkedonker avatar Dec 15 '23 12:12 hylkedonker