math
math copied to clipboard
Allow multivariate distributions to take in matrices
Description
Looking at the Stan docs we only accept arrays of vectors and row vectors for the multivariate distributions. What do folks think about allowing matrices as well?
We could restrict the current signatures to accept std::vector<Template>
and then add a version that accepts matrices of size ([N, M], [N, M], [M, M])
.
Example
This would allow for things in Stan like
data {
int N;
int M;
matrix[N, M] Y;
}
parameters {
matrix[N, M] Mu;
matrix[M, M] Sigma;
}
model {
Y ~ multi_normal(Mu, Sigma);
}
I would personally find this useful for time series models where Y
is a matrix of time series and Mu
is made in transformed parameters and does the VAR component.
Expected Output
All multivariate distributions support matrix inputs
Current Version:
v4.1.0
Also having a matrix specialization would allow for some performance speedups
I'll try! Will submit a draft PR once I have something working.
We just need to be super clear on what index the observations are on. Do we choose rows or do we allow either rows or columns? If we allow both then we need a flag input and that seems to go against the design of other functions.
It's not a big issue, just that we'll want informative errors in stanc to let the user know that the observation index is assumed to be on the rows of the matrix.
I think since it takes in an array[N] vector[M]
scheme right now the equivalent is matrix[N, M]
I think since it takes in an
array[N] vector[M]
scheme right now the equivalent ismatrix[N, M]
Counterpoint: I think it also accepts a array[N] row_vector[M]
scheme right now too?
To eliminate ambiguity, let's talk of an V
-dimensional multivariate with S
samples. So I think the current scheme supports both "column-wise variates":
array[S] vector[V]
and row-wise variates:
array[S] row_vector[V]
Would the internals of the function perform faster with one variant vs the other? On the one hand I'd expect the core columnar memory representations would suggest that column-wise variates might perform faster, but maybe that's my naïveté of the internals showing.
On the other hand, I've recently been reformulating the standard hierarchal model to use columns_dot_product rather than rows_dot_product, again for validated performance reasons, and in that context, I'd usually have a matrix[V,S]
result from the earlier compute that I'd have to transpose if the new matrix inputs expect variates to be column-wise.
Edit: looking at that code again, the matrix gZc
is created by re-packing a vector, so it can be packed to have column-wise variates and doesn't stand as an example calling for row-wise variate support.
Edit2: oh wait, the compute and subsequent use of iZc_mat
would need a transpose if only column-wise variates were supported. Would it make sense (usability- & performance- & implementation- & upkeep-wise) to have a Boolean argument denoting whether the input has column-wise or row-wise variates?
@adamhaber maybe your work on this so far might shed light on what makes most sense?
Sorry I missed this the first time around. The reason we've never done this before is because it's not clear with @SteveBronder's example whether it's the columns that have a multi-normal distribution or the rows.
@mike-lawrence: I think the current scheme supports both "column-wise variates" ... and row-wise variates:
That's right.
Would the internals of the function perform faster with one variant vs the other?
Probably not, but if so only a little bit, because Eigen let's you convert w/o copying.
On the one hand I'd expect the core columnar memory representations would suggest that column-wise variates might perform faster
Arrays of (row) vectors are neither row major nor column major in the traditional sense because the memory's not contiguous. In C++ terms, it's a std:: vector
of Eigen vectors or row vectors. That means accessing the vectors costs the same either way. Our matrices are indeed store column major, but the best way to think of our arrays is as row major.
But with the varmat
magic bullet it should make a performance difference if one allows matrices for y
. So we would have to pick a convention.
How about mu
being a row_vector
=> it's row-wise and being a column vector
=>it's column-wise?
How about
mu
being arow_vector
=> it's row-wise and being a columnvector
=>it's column-wise?
@wds15 I'm not sure I follow; as per OP, Mu
is a matrix.
in case mu is a matrix, then one needs a convention, sure... but I thought we also have the case of mu being a vector, no?
(I am probably too much out of context...so sorry for confusion)
but I thought we also have the case of mu being a vector, no?
Ah, good point. And highlights that I'd been thinking solely about Mu and we probably want some good conventions for Y as a matrix too.
To summarise, at present multi_normal()
and it's related functions all have three arguments, y
, mu
, and a third argument that's always a matrix and can be ignored for our discussion here. At present, all the following are supported:
multi_normal_lpdf(vectors y | vectors mu ...)
multi_normal_lpdf(row_vectors y | row_vectors mu ... )
multi_normal_lpdf(vectors y | row_vectors mu ...)
multi_normal_lpdf(row_vectors y | vectors mu ... )
In which case presumably the internals only care that the number of elements in y
and mu
vectors/row_vectors match, and if y
& mu
are arrays of vectors/row_vectors, presumably it wants the number of array elements to match (possibly mu
can also be a length-1 array?).
As @wds15 suggests, in the case of y
being a matrix and mu
being either a vector, row_vector, or an array thereof, I think it would make sense to then take the cue from the sub-type of mu
to determine what dimension of y
is considered the variate dimension and what dimension is considered the sample dimension:
multi_normal_lpdf(matrix y | vectors mu ...) // treat y as have column-wise variates
multi_normal_lpdf(matrix y | row_vectors mu ...) // treat y as have row-wise variates
In the case that only mu
is a matrix, we can similarly take the cue from the sub-type of y:
multi_normal_lpdf(vectors y | matrix mu ...) // treat mu as have column-wise variates
multi_normal_lpdf(row_vectors y | matrix mu ...) // treat mu as have row-wise variates
In the case that both y
and mu
are matrices, I suggest an explicit argument specifying the structure, akin to the use of the col_major
argument of to_matrix()
:
multi_normal_lpdf(matrix y | matrix mu , ..., int col_variates )
presumably the internals only care that the number of elements in y and mu vectors/row_vectors match,
That's right.
I still don't like any of the new proposed signatures because they're too confusing. How about we have conversion functions which map matrices to arrays of vectors? Then the confusion goes away.
I still don't like any of the new proposed signatures because they're too confusing. How about we have conversion functions which map matrices to arrays of vectors? Then the confusion goes away.
Good point. Presumably there's no performance benefits to putting the re-formatting inside multi_normal_*()
versus your proposed user-reformating via multi_normal( to_array(...), ...)
? Not saying performance would weight high in a design decision if so, but certainly if there'd be no expectation of performance difference I'd agree that your proposal (leaving the reformatting to the user rather than having a guessing scheme as I proposed) makes sense.
Any to_array( matrix[R,C] )
function would need an boolean argument specifying whether to convert to an array of row-vectors or an array of vectors, right?
Presumably there's no performance benefits to putting the re-formatting inside multi_normal_*()
There's a slight performance benefit to coding directly in multi_normal
because we could use views (expression templates) rather than copies. The copy cost is probably going to be slight compared to everything else going on in a multivariate normal because there's no autodiff implications. The main issue is increased memory pressure, which we'd like to keep down. The main downside to pushing this inside multi_normal
is that it makes the multi_normal
function more complex.
Any to_array( matrix[R,C] ) function would need an boolean argument
Our current to_vector(matrix)
function just assumes a column-major order. I'm suggesting something like:
array[] row_vector to_array_of_row_vectors(matrix);
array[] vector to_array_of_vectors(matrix);
The to row-vector version will be inefficient in high dimensions because of memory locality, but that's true whether we do it inside multi_normal
or outside. If we really need the row-vector version, it's probably best to implement a transpose first internally (which Eigen does cleverly by blocks).
@bob-carpenter now that I'm thinking about it more I agree my signature is confusing. Though for my example above we do know that Sigma
is always size [M, M]
and then Mu
/Y
must also be size [*, M]
or [M, *]
data {
int N;
int M;
matrix[N, M] Y;
}
parameters {
matrix[N, M] Mu;
matrix[M, M] Sigma;
}
model {
// (1)
Y ~ multi_normal(Mu, Sigma);
// (2) Could also be fine
Y' ~ multi_normal(Mu', Sigma);
}
So what we could do at runtime is check whether the rows or columns of Y
and Mu
match Sigma. In case (1) the columns of Y
and Mu
match the rows/cols of Sigma
so we would treat it as columns having a multi-normal distribution. In case (2) the rows of Y'
and Mu'
match the row/cols of sigma so we would treat each row as having a multi-normal distribution. Does that make sense? Apologies if I messed up any wording
@SteveBronder: Yes, that makes sense in a type-theoretic sense. We (meaning everyone on the project at the time coming to the general Stan meetings) discussed all these signatures at length like six or eight years ago and decided not to do it because it'd be too confusing.
I'm generally opposed to adding these matrix-matrix signatures. I'm very strongly opposed to adding signatures that involve run-time decisions about shape to resolve.
I'm totally OK with adding reshaping functions from matrices to arrays of vectors or row vectors but I still think it's going to be hard to follow because of the row/column distinctions.
to_array_of_vectors(Y) ~ multi_normal(to_array_of_vectors(mu), Sigma);
to_array_of_row_vectors(Y) ~ multi_normal(to_array_of_row_vectors(mu), Sigma);
How about adding matrix normal distribution? Row-wise and column-wise multi-normal distributions are special cases of it.
Y ~ matrix_normal(Mu, Sigma, identity_matrix(Ncols));
Y ~ matrix_normal(Mu, identity_matrix(Nrows), Sigma);
@nhuure: There's already an issue to create a matrix normal (and matrix Student-t) distribution. The idea is to do it with a Cholesky factor parameterization of the covariance matrices (or even better, potentially sparse precision matrices).
For this to be fast for this application we'd need to detect the identity matrix and ignore it. That's easy for both dense matrices and Cholesky factors.
I think in the C++ we could have code that checks if the input type is the same type as how Eigen defines identity matrices (and what the (identity_matrix())
function returns and do the appropriate thing after that
https://godbolt.org/z/G684jsErE
Is anyone working on this?
I'm still opposed. See https://github.com/stan-dev/math/issues/2532#issuecomment-1042322064
Srry to be clear I was talking about the normal matrix distribution not accepting matrices into multi normal. Bob are you cool with that?
For my understanding: Is the desire to take in matrices to get better performance? If so, then a prototype demonstrating the speed advantages would be a good motivation to sort out the syntax.
For my understanding: Is the desire to take in matrices to get better performance? If so, then a prototype demonstrating the speed advantages would be a good motivation to sort out the syntax.
Actually more user-convenience I'd say. See my description here.
@wds15 its a mix of both. For my example I'm working with a bunch of time series and it would be nice to keep them in a big matrix. I could write a lil' function that would do matrix -> array of vectors but it would be nicer and more efficient to just have a matrix_normal()
that works on the matrix directly
@mike-lawrence Actually more user-convenience I'd say.
That was my understanding. Performance gains would be minimal given that there's a quadratic form in each multivariate normal eval.
@steve-bronder I was talking about the normal matrix distribution not accepting matrices into multi normal
There's also a matrix-normal distribution that takes two covariance matrices. Is that what you're talking about?
The big mistake I made in the original data structure design was distinguishing the memory representation of 2D arrays, 1D array of vector/row-vector, and matrix. I should've packed everything in memory locally and then used Map
for all the Eigen stuff so the conversions would be free.