math Allow multivariate distributions to take in matrices

Description

Looking at the Stan docs we only accept arrays of vectors and row vectors for the multivariate distributions. What do folks think about allowing matrices as well?

We could restrict the current signatures to accept std::vector<Template> and then add a version that accepts matrices of size ([N, M], [N, M], [M, M]).

Example

This would allow for things in Stan like

data {
 int N;
 int M;
 matrix[N, M] Y;
}

parameters {
  matrix[N, M] Mu;
  matrix[M, M] Sigma;
}

model {
 Y ~ multi_normal(Mu, Sigma);
}

I would personally find this useful for time series models where Y is a matrix of time series and Mu is made in transformed parameters and does the VAR component.

Expected Output

All multivariate distributions support matrix inputs

Current Version:

v4.1.0

Jul 12 '21 17:07 SteveBronder

Also having a matrix specialization would allow for some performance speedups

Jul 12 '21 17:07 SteveBronder

I'll try! Will submit a draft PR once I have something working.

Jul 13 '21 19:07 adamhaber

We just need to be super clear on what index the observations are on. Do we choose rows or do we allow either rows or columns? If we allow both then we need a flag input and that seems to go against the design of other functions.

It's not a big issue, just that we'll want informative errors in stanc to let the user know that the observation index is assumed to be on the rows of the matrix.

Jul 15 '21 13:07 spinkney

I think since it takes in an array[N] vector[M] scheme right now the equivalent is matrix[N, M]

Jul 16 '21 06:07 SteveBronder

I think since it takes in an array[N] vector[M] scheme right now the equivalent is matrix[N, M]

Counterpoint: I think it also accepts a array[N] row_vector[M] scheme right now too?

To eliminate ambiguity, let's talk of an V-dimensional multivariate with S samples. So I think the current scheme supports both "column-wise variates":

array[S] vector[V]

and row-wise variates:

array[S] row_vector[V]

Would the internals of the function perform faster with one variant vs the other? On the one hand I'd expect the core columnar memory representations would suggest that column-wise variates might perform faster, but maybe that's my naïveté of the internals showing.

On the other hand, I've recently been reformulating the standard hierarchal model to use columns_dot_product rather than rows_dot_product, again for validated performance reasons, and in that context, I'd usually have a matrix[V,S] result from the earlier compute that I'd have to transpose if the new matrix inputs expect variates to be column-wise. Edit: looking at that code again, the matrix gZc is created by re-packing a vector, so it can be packed to have column-wise variates and doesn't stand as an example calling for row-wise variate support. Edit2: oh wait, the compute and subsequent use of iZc_mat would need a transpose if only column-wise variates were supported. Would it make sense (usability- & performance- & implementation- & upkeep-wise) to have a Boolean argument denoting whether the input has column-wise or row-wise variates?

@adamhaber maybe your work on this so far might shed light on what makes most sense?

Feb 09 '22 15:02 mike-lawrence

Sorry I missed this the first time around. The reason we've never done this before is because it's not clear with @SteveBronder's example whether it's the columns that have a multi-normal distribution or the rows.

@mike-lawrence: I think the current scheme supports both "column-wise variates" ... and row-wise variates:

That's right.

Would the internals of the function perform faster with one variant vs the other?

Probably not, but if so only a little bit, because Eigen let's you convert w/o copying.

On the one hand I'd expect the core columnar memory representations would suggest that column-wise variates might perform faster

Arrays of (row) vectors are neither row major nor column major in the traditional sense because the memory's not contiguous. In C++ terms, it's a std:: vector of Eigen vectors or row vectors. That means accessing the vectors costs the same either way. Our matrices are indeed store column major, but the best way to think of our arrays is as row major.

Feb 09 '22 19:02 bob-carpenter

But with the varmat magic bullet it should make a performance difference if one allows matrices for y. So we would have to pick a convention.

How about mu being a row_vector => it's row-wise and being a column vector =>it's column-wise?

Feb 10 '22 10:02 wds15

How about mu being a row_vector => it's row-wise and being a column vector =>it's column-wise?

@wds15 I'm not sure I follow; as per OP, Mu is a matrix.

Feb 10 '22 22:02 mike-lawrence

in case mu is a matrix, then one needs a convention, sure... but I thought we also have the case of mu being a vector, no?

(I am probably too much out of context...so sorry for confusion)

Feb 11 '22 11:02 wds15

but I thought we also have the case of mu being a vector, no?

Ah, good point. And highlights that I'd been thinking solely about Mu and we probably want some good conventions for Y as a matrix too.

To summarise, at present multi_normal() and it's related functions all have three arguments, y, mu, and a third argument that's always a matrix and can be ignored for our discussion here. At present, all the following are supported:

multi_normal_lpdf(vectors y | vectors mu ...) 
multi_normal_lpdf(row_vectors y | row_vectors mu ... )
multi_normal_lpdf(vectors y | row_vectors mu ...) 
multi_normal_lpdf(row_vectors y | vectors mu ... )

In which case presumably the internals only care that the number of elements in y and mu vectors/row_vectors match, and if y & mu are arrays of vectors/row_vectors, presumably it wants the number of array elements to match (possibly mu can also be a length-1 array?).

As @wds15 suggests, in the case of y being a matrix and mu being either a vector, row_vector, or an array thereof, I think it would make sense to then take the cue from the sub-type of mu to determine what dimension of y is considered the variate dimension and what dimension is considered the sample dimension:

multi_normal_lpdf(matrix y | vectors mu ...) // treat y as have column-wise variates
multi_normal_lpdf(matrix y | row_vectors mu ...) // treat y as have row-wise variates

In the case that only mu is a matrix, we can similarly take the cue from the sub-type of y:

multi_normal_lpdf(vectors y | matrix mu ...) // treat mu as have column-wise variates
multi_normal_lpdf(row_vectors y | matrix mu ...) // treat mu as have row-wise variates

In the case that both y and mu are matrices, I suggest an explicit argument specifying the structure, akin to the use of the col_major argument of to_matrix():

multi_normal_lpdf(matrix y | matrix mu , ..., int col_variates )

Feb 11 '22 15:02 mike-lawrence

presumably the internals only care that the number of elements in y and mu vectors/row_vectors match,

That's right.

I still don't like any of the new proposed signatures because they're too confusing. How about we have conversion functions which map matrices to arrays of vectors? Then the confusion goes away.

Feb 11 '22 21:02 bob-carpenter

I still don't like any of the new proposed signatures because they're too confusing. How about we have conversion functions which map matrices to arrays of vectors? Then the confusion goes away.

Good point. Presumably there's no performance benefits to putting the re-formatting inside multi_normal_*() versus your proposed user-reformating via multi_normal( to_array(...), ...)? Not saying performance would weight high in a design decision if so, but certainly if there'd be no expectation of performance difference I'd agree that your proposal (leaving the reformatting to the user rather than having a guessing scheme as I proposed) makes sense.

Any to_array( matrix[R,C] ) function would need an boolean argument specifying whether to convert to an array of row-vectors or an array of vectors, right?

Feb 16 '22 19:02 mike-lawrence

Presumably there's no performance benefits to putting the re-formatting inside multi_normal_*()

There's a slight performance benefit to coding directly in multi_normal because we could use views (expression templates) rather than copies. The copy cost is probably going to be slight compared to everything else going on in a multivariate normal because there's no autodiff implications. The main issue is increased memory pressure, which we'd like to keep down. The main downside to pushing this inside multi_normal is that it makes the multi_normal function more complex.

Any to_array( matrix[R,C] ) function would need an boolean argument

Our current to_vector(matrix) function just assumes a column-major order. I'm suggesting something like:

array[] row_vector to_array_of_row_vectors(matrix);
array[] vector to_array_of_vectors(matrix);

The to row-vector version will be inefficient in high dimensions because of memory locality, but that's true whether we do it inside multi_normal or outside. If we really need the row-vector version, it's probably best to implement a transpose first internally (which Eigen does cleverly by blocks).

Feb 16 '22 20:02 bob-carpenter

@bob-carpenter now that I'm thinking about it more I agree my signature is confusing. Though for my example above we do know that Sigma is always size [M, M] and then Mu/Y must also be size [*, M] or [M, *]

data {
 int N;
 int M;
 matrix[N, M] Y;
}

parameters {
  matrix[N, M] Mu;
  matrix[M, M] Sigma;
}

model {
 // (1)
 Y ~ multi_normal(Mu, Sigma);
  // (2) Could also be fine
 Y' ~ multi_normal(Mu', Sigma);
}

So what we could do at runtime is check whether the rows or columns of Y and Mu match Sigma. In case (1) the columns of Y and Mu match the rows/cols of Sigma so we would treat it as columns having a multi-normal distribution. In case (2) the rows of Y' and Mu' match the row/cols of sigma so we would treat each row as having a multi-normal distribution. Does that make sense? Apologies if I messed up any wording

Feb 16 '22 20:02 SteveBronder

@SteveBronder: Yes, that makes sense in a type-theoretic sense. We (meaning everyone on the project at the time coming to the general Stan meetings) discussed all these signatures at length like six or eight years ago and decided not to do it because it'd be too confusing.

I'm generally opposed to adding these matrix-matrix signatures. I'm very strongly opposed to adding signatures that involve run-time decisions about shape to resolve.

I'm totally OK with adding reshaping functions from matrices to arrays of vectors or row vectors but I still think it's going to be hard to follow because of the row/column distinctions.

to_array_of_vectors(Y) ~ multi_normal(to_array_of_vectors(mu), Sigma);

to_array_of_row_vectors(Y) ~ multi_normal(to_array_of_row_vectors(mu), Sigma);

Feb 16 '22 21:02 bob-carpenter

How about adding matrix normal distribution? Row-wise and column-wise multi-normal distributions are special cases of it.

Y ~ matrix_normal(Mu, Sigma, identity_matrix(Ncols));
Y ~ matrix_normal(Mu, identity_matrix(Nrows), Sigma);

Feb 17 '22 07:02 nhuurre

@nhuure: There's already an issue to create a matrix normal (and matrix Student-t) distribution. The idea is to do it with a Cholesky factor parameterization of the covariance matrices (or even better, potentially sparse precision matrices).

For this to be fast for this application we'd need to detect the identity matrix and ignore it. That's easy for both dense matrices and Cholesky factors.

Feb 17 '22 22:02 bob-carpenter

I think in the C++ we could have code that checks if the input type is the same type as how Eigen defines identity matrices (and what the (identity_matrix()) function returns and do the appropriate thing after that

https://godbolt.org/z/G684jsErE

Is anyone working on this?

Apr 28 '22 20:04 SteveBronder

I'm still opposed. See https://github.com/stan-dev/math/issues/2532#issuecomment-1042322064

May 02 '22 16:05 bob-carpenter

Srry to be clear I was talking about the normal matrix distribution not accepting matrices into multi normal. Bob are you cool with that?

May 03 '22 04:05 SteveBronder

For my understanding: Is the desire to take in matrices to get better performance? If so, then a prototype demonstrating the speed advantages would be a good motivation to sort out the syntax.

May 03 '22 08:05 wds15

For my understanding: Is the desire to take in matrices to get better performance? If so, then a prototype demonstrating the speed advantages would be a good motivation to sort out the syntax.

Actually more user-convenience I'd say. See my description here.

May 03 '22 13:05 mike-lawrence

@wds15 its a mix of both. For my example I'm working with a bunch of time series and it would be nice to keep them in a big matrix. I could write a lil' function that would do matrix -> array of vectors but it would be nicer and more efficient to just have a matrix_normal() that works on the matrix directly

May 03 '22 20:05 SteveBronder

@mike-lawrence Actually more user-convenience I'd say.

That was my understanding. Performance gains would be minimal given that there's a quadratic form in each multivariate normal eval.

@steve-bronder I was talking about the normal matrix distribution not accepting matrices into multi normal

There's also a matrix-normal distribution that takes two covariance matrices. Is that what you're talking about?

The big mistake I made in the original data structure design was distinguishing the memory representation of 2D arrays, 1D array of vector/row-vector, and matrix. I should've packed everything in memory locally and then used Map for all the Eigen stuff so the conversions would be free.

May 03 '22 23:05 bob-carpenter

math math copied to clipboard

Allow multivariate distributions to take in matrices

Description

Example

Expected Output

Current Version:

math
math copied to clipboard