matrix normal distribution
Implement one or more forms of the matrix normal distribution. The general distribution is of the form
matrix_normal(matrix[N, P] y | matrix[N, P] mu, cov_matrix[N, N] SigmaRow, cov_matrix[P, P] SigmaCol),
where SigmaRow and SigmaCol are positive definite covariance matrices. The covariance matrix SigmaRow controls the covariance of the rows of y whereas SigmaRow controls the covariances of the columns.
There are three design decisions about how SigmaRow and SigmaCol are parameterized, which are whether to use
- diagonal or dense matrices,
- covariance or precision (inverse covariance), and
- Cholesky factors or full matrices.
There is also the issue of how to resolve the non-identifiability of scale between the two matrices, because
matrix_normal(y | mu, SigmaRow, SigmaCol) == matrix_normal(y | mu, c * SigmaRow, (1 / c) * SigmaCol).
The log density can be expressed as follows, but this is not how we'd compute it.
log matrix_normal(y | mu, SigmaRow, SigmaCol)
= - 1 / 2 * tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
- N / 2 * log det(SigmaRow)
- P / 2 * log det(SigmaCol)
- N * P / 2 * log(2 * pi())
If LRow and LCol are Cholesky factors of SigmaRow and SigmaCol so that, e.g., SigmaRow = LRow * LRow', and z ~ normal(0, I), then
y = mu + LRow * z * LCol ~ matrix_normal(mu, SigmaRow, SigmaCol)
As shown on the Wikipedia page for matrix normal, we can evaluate that trace term as
tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
= (vec(y) - vec(mu))' * inv(SigmaCol kronecker SigmaRow) * (vec(y) - vec(mu))
where vec(y) is the flattening of y. We then want to use the property of Kronecker products that
inv(SigmaCol kronecker SigmaRow) = inv(SigmaCol) kronecker inv(SigmaRow).
@bgoodri suggested going further with Cholesky factorizations LRow and LCol using the rules for inverse of the product of square matrixes inv(A * B) = inv(B) * inv(A) and the trace of a product rule tr(A' * B) = vec(A)' * vec(B), to derive
tr(inv(SigmaCol) * (y - mu)' * inv(SigmaRow) * (y - mu))
= tr(inv(LCol * LCol') * (y - mu)' * inv(LRow * LRow') * (y - mu))
= tr(inv(LCol') * inv(LCol) * (y - mu)' * inv(LRow') * inv(LRow) * (y - mu))
= tr((LCol' \ LCol \ (y - mu)') * (LRow' \ LRow \ (y - mu)))
= dot_product(vec(LCol' \ LCol \ (y - mu')), vec(LRow' \ LRow \ (y - mu)))
Identifiability will be left up to the user. One way to achieve it is to define one of the matrices to be a correlation matrix with a simplex of scales. That's most efficient when done with a simple product of a Cholesky factor rather than a quadratic form for the full correlation matrix. A second, more symmetric way to achieve a proper posterior is to decompose both covariance matrices into scales time Cholesky factors and put priors on both of the scales.
There is also a matrix Student-t distribution, which should just follow whatever matrix normal does.
From @rtrangucci on March 24, 2015 14:42
We still need first-order reverse-mode tests for matrix_normal_prec_log, unless I haven't found all the tests.
Right now there are three .cpp files that implement matrix_normal_prec_log tests:
- test/unit/math/prim/mat/prob/matrix_normal_test.cpp
- test/unit/math/mix/mat/prob/matrix_normal_test.cpp
- test/unit/math/fwd/mat/prob/matrix_normal_test.cpp
As far as I can tell, none of these files implement reverse-mode first order tests. The fwd and mix files run first-order fvar<var> and fvar<fvar<var>> tests. We need a test/unit/math/rev/mat/prob/matrix_normal_test.cpp that does the reverse-mode tests.