design-docs
design-docs copied to clipboard
Sparse Matrices
Sparse matrix support in the stan language and backends.
Summary
There has been much discussion about sparse matrices in Stan. This design doc brings together the discusions on how to implement them at the language, IO, and math levels. The below gives a TL;DR for each section.
Language
There will be a new sparse_matrix type with the non-zero (NZ) sparsity structure defined as bounds.*
sparse_matrix<nz_rows=nz_row_ind, nz_cols=nz_col_ind>[N, M] A;
bounds make specifying the sparsity optional so that the sparsity pattern can be deduced under the hood for algebra etc. at the stan math level.
sparse_matix[N, N] B = A * A';
I/O
Sparse matrices come in lists of lists from the json or rdump
Stan math
We can either do a big refactoring to simplify the codebase or include specializations for the functions that take sparse matrices.
* I personally prefer the attribute style mentioned in the alternatives section, but Dan and Aki have both expressed interest in the <> style and did not like the attributed style. While it's only an N of 2 I like to think the user is right in what is aesthetically pleasing to them.
I had this open on a separate draft Pr and am going to bring over some of the discussion
I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan. Some starter points off the top of my head: pros:
- can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time.
- can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.
cons:
- Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.
- I think everyone universally regrets adding both
real[]andvectorto Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations onreal[]doesn't seem to have served any real purpose.
The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here?
On Aug 28, 2019, at 3:05 PM, seantalts [email protected] wrote:
I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan.
If they don't have their own type, how does the decision to represent a matrix get handled? At the very least, we'll need a new I/O format so that we don't need quadratic memory in and out.
What happens when we print or save parameters to a CSV file during sampling? Does it do one thing for sparse and one thing for dense? How will the user be able to figure out what's going to happen? What's going to ensure we have the same parameters every iteration if there's no way to type the sparsity pattern?
What about row vectors? That's a third way to represent a sequence of real values.
Then at higher orders, we get the same issue with array of vectors (type of vectorized outcomes for multinormal), array of row vectors, a matrix, and a 2D array? Now we have four Stan types for the same abstract 2D structure.
Is this at all like integer vs. double? We can assign integer to double (perhaps with loss of precision, and vice-versa with rounding), but mathematically we tend to think of them differently. That difference doesn't make much difference in stats other than that there are naturally integer disributions like Poisson or Bernoulli.
How do you think about integer vs. double? That's a common distinction in programming languages that languages like R decided to sweep under the rug. It's there in the guts of R implicitly, as well as a distinction from bool; I imagine something similar is being suggested for Stan, where the decision is implicit on the framework's part (in R, you can modify and inspect, but I think the proposal for Stan is to literally collapse the types from the user's perspective so that there's no way to tell if a matrix is sparse or dense).
Some starter points off the top of my head: pros:
• can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time. • can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.
It's easy enough to add converters all over the place that convert sparse to dense when necessary. But that won't give us the right result. We can't just remove zero values, as we may need them for autodiff (this is a problem with current Eigen implementations). In particular, we need to be sure that we get the same sparsity in output for parameters at each iteration. If we just willy-nilly promote to dense, that won't be the case or might not be predictable.
cons:
• Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.
They will very much care during I/O.
What mathematical representation is supposed to be one-one with a Stan program? The mapping from programs to posterior densities is many to one.
• I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose.
I don't regret it! It's a natural consequence of having primitive types int, real, vector, row_vector and matrix and closing under taking arrays. If you don't allow it, it complicates every definition and lots of code where it has to be excluded from the natural definition. I'm not saying it's impossible, just that it's going to add its own set of complications.
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I made the same initial distinction as is made in Eigen, namely that linear algebra operations apply to matrix types. It's also clunky in Eigen. But it is worth keeping in mind that the major C++ matrix lib did find it useful to make that distinction. They're even stricter, I think, in limiting elementwise operations to arrays.
What about row vectors? That's a third way of representing a sequence of real numbers. The reason it's nice to distinguish these types is that multiplication signatures are real(row_vector, vector) and matrix(vector, row_vector).
I think the argument is that real arrays are redundant because there's nothing you can do with an array you can't do with a vector or row_vector instead. That may be true now so that we wouldn't lose any functionality getting rid of them. I don't think it would fail anywhere.
The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren can offer advice here?
Lauren is @lauken13, but I'm answering via email, so my @ won't work.
Dan Simpson probably has the most experience with this on the project, though I'd suggest asking Andrew, too.
If it’s not a type it might as well not be done. It would be extremely difficult to use. Akin to not having a dense matrix type and making users deal with flattened representations.
Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning. Struggle to think of a use case for a sparse-to-dense cast. What’s your use case.
No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this.
On Wed, 28 Aug 2019 at 15:05, seantalts [email protected] wrote:
I think the doc could use a little discussion on whether sparse matrices deserve their own type in Stan. Some starter points off the top of my head: pros:
- can add functions dealing with sparsity in an adhoc fashion to the math library slowly over time.
- can prevent users from doing inefficient things with sparse matrices, like using it in a function that would just convert it to dense first.
cons:
- Users don't necessarily care that their matrices are stored in sparse or dense form; they want efficient inference. I think there's some wisdom in trying to keep a 1-to-1 correspondence between a Stan model and the mathematical representation of it, if we can keep computational issues behind the silk screen.
- I think everyone universally regrets adding both real[] and vector to Stan, which is a pretty similar use-case in that both are representing the same mathematical object just with slightly different computational properties. Keeping users from doing linear algebra operations on real[] doesn't seem to have served any real purpose.
The "what do users want" question difficult to test because anyone who is already aware of sparse matrices will probably consider them necessary as first-class objects, but there's a whole class of Stan user who would just like their model to run faster and wouldn't care if we switched to sparse under the hood. I wonder if @lauren https://github.com/lauren can offer advice here?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBWJGDIWECK6LGTGTJDQG3EBFA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MEZRQ#issuecomment-525880518, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBRHGWMVFTWJ3MXPWHLQG3EBFANCNFSM4IPODLXA .
Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning.
The motivation would be to make sure user programs with the right math don't break because they got the data types wrong. We see that all the time now because our users aren't programmers.
Of course, if we have a function that takes a sparse matrix in and then outputs a dense matrix, that's going to be considered broken by most people who use sparse matrices.
No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this.
No pretending necessary. Mathematically, a sparse matrix is just a dense matrix with a lot of zeros (no surprise to anyone there, I hope). So the user just treats everything as a matrix (except, perhaps, during I/O) and leaves the system to figure out where things should be dense or sparse under the hood. There's a question of feasibility, but we're not violating any laws of computation here.
R doesn't have a built-in sparse matrix type (though there is a contributed Matrix package with sparsity).
MATLAB does have built-in sparsity and uses a separate type system.
https://www.mathworks.com/help/matlab/sparse-matrices.html
Under the hood, it just represents dense matrix types as an array and there's no specific vector or row-vector types, just matrix types.
https://www.mathworks.com/help/matlab/matrices-and-arrays.html
Maybe we should've done that. I really wanted to have row vector times vector to be a scalar and really wanted matrix times vector to be a vector. In MATLAB, everything's a matrix and they just use vector constructors. In MATLAB, [1 2 3] is a 1 x 3 matrix (i.e., a row vector) whereas [1; 2; 3] is a 3 x 1 matrix and [1 2; 3 4] is a 2 x 2 matrix.
Consider what R does for integer vs. real vs. boolean distinctions:
> is.integer(1)
[1] FALSE
> is.integer(as.integer(1))
[1] TRUE
I find this kind of behavior in R very confusing. Same with transposing a vector, which turns it into a matrix.
> c(1, 2, 3)
[1] 1 2 3
> t(c(1, 2, 3))
[,1] [,2] [,3]
[1,] 1 2 3
It gets really fun trying to predict the result of mixed operations
> t(t(c(1, 2, 3))) == c(1, 2, 3)
[,1]
[1,] TRUE
[2,] TRUE
[3,] TRUE
I really don't like that t(t(x)) != x.
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I agree — arrays are a general container for any type and vectors and row vectors are linear algebraic objects with well-defined operations one can apply on them. If one defines a type by its admissable operations then the similarity between real[] and vector/row_vector, not to mention matrix[1, N] and matrix[N, 1]!, is only a superficial one.
I think lots of the user confusion comes from people accustomed to R and its dynamic/weak type system. They expect the structural similarity to carry over to functional similarity and get frustrated when it doesn’t. At the very least this is something that should be emphasized very strongly in introductory material so that users are at least aware for the issue.
Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning. Struggle to think of a use case for a sparse-to-dense cast. What’s your use case.
One that tops to my mind is incomplete implementation of sparse functionality. For example if I wanted to compute a matrix exponential but no one has yet implemented a sparse matrix exponential then can I no longer proceed or can I cast my sparse matrix to a dense one and proceed, albeit with the more expensive calculation.
No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this.
There is potential for some expression template methods that would lazily evaluate expressions containing dense and sparse matrices and figure out how to propagate the sparsity only at that final evaluation, but that would be a pain to implement.
I think that the only useful “hidden” abstraction would be hiding sparse matrices entirely and just have functions that take in sparse inputs and return dense or low-dimensional outputs, like quadratic forms, determinants, or densities.
On Wed, Aug 28, 2019 at 16:09 Bob Carpenter [email protected] wrote:
Not sure why any function would cast a sparse matrix to a different type automatically rather than just throw a “not defined for type sparse” warning.
The motivation would be to make sure user programs with the right math don't break because they got the data types wrong. We see that all the time now because our users aren't programmers.
Of course, if we have a function that takes a sparse matrix in and then outputs a dense matrix, that's going to be considered broken by most people who use sparse matrices.
This would be extremely bad design. It should throw and informative error. Otherwise we’ll spend our whole time explaining to people why they added one line that had an implicit cast and their code no longer worked because it ran out of memory.
No earthly idea how to “pretend” sparse matrices are just matrices that are different in hidden under the hood ways. Neither matlab or R does this.
No pretending necessary. Mathematically, a sparse matrix is just a dense matrix with a lot of zeros (no surprise to anyone there, I hope). So the user just treats everything as a matrix (except, perhaps, during I/O) and leaves the system to figure out where things should be dense or sparse under the hood. There's a question of feasibility, but we're not violating any laws of computation here.
R doesn't have a built-in sparse matrix type (though there is a contributed Matrix package with sparsity).
MATLAB does have built-in sparsity and uses a separate type system.
https://www.mathworks.com/help/matlab/sparse-matrices.html
Under the hood, it just represents dense matrix types as an array and there's no specific vector or row-vector types, just matrix types.
https://www.mathworks.com/help/matlab/matrices-and-arrays.html
Maybe we should've done that. I really wanted to have row vector times vector to be a scalar and really wanted matrix times vector to be a vector. In MATLAB, everything's a matrix and they just use vector constructors. In MATLAB, [1 2 3] is a 1 x 3 matrix (i.e., a row vector) whereas [1; 2; 3] is a 3 x 1 matrix and [1 2; 3 4] is a 2 x 2 matrix.
Consider what R does for integer vs. real vs. boolean distinctions:
is.integer(1) [1] FALSE
is.integer(as.integer(1)) [1] TRUE
I find this kind of behavior in R very confusing. Same with transposing a vector, which turns it into a matrix.
c(1, 2, 3) [1] 1 2 3
t(c(1, 2, 3)) [,1] [,2] [,3] [1,] 1 2 3
It gets really fun trying to predict the result of mixed operations
t(t(c(1, 2, 3))) == c(1, 2, 3) [,1] [1,] TRUE [2,] TRUE [3,] TRUE
I really don't like that t(t(x)) != x.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBQAWMUIKLRCGWOLR5TQG3LRBA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MKH4A#issuecomment-525902832, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBVOM3AMAFQXUZU5GFDQG3LRBANCNFSM4IPODLXA .
This would be extremely bad design. It should throw and informative error. Otherwise we’ll spend our whole time explaining to people why they added one line that had an implicit cast and their code no longer worked because it ran out of memory.
Just to be concrete, the model Mitzi used in the ICAR case study uses an extremely small sparse matrix. A cast to dense would cast from a vector of 2055 elements to a vector (flattened column major matrix) of 709 * 709 = 502681 elements. So this is very different to casting from real[] to vector[], which are roughly the same size in memory (modulo possible locality stuff and maybe some small class overhead).
One that tops to my mind is incomplete implementation of sparse functionality. For example if I wanted to compute a matrix exponential but no one has yet implemented a sparse matrix exponential then can I no longer proceed or can I cast my sparse matrix to a dense one and proceed, albeit with the more expensive calculation.
There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize.
There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize
Sure, the question is what is the user experience be if it hadn’t been specialized.
I feel like at some point users will have some corner applications where they need to finish a calculation using dense methods. Given the potential memory blowout from naively going from a sparse matrix to a giant dense matrix it seems prudent to not do this implicitly but an explicit casting function that maps sparse to dense (and potentially even throws with an informative error if the dimensions are too big) might be a good compromise initially.
I can't disagree more.
These aren't incomplete or unimplemented features, these are features that should not exist. For scale, it's like when people say that we should have a Gibbs sampler "just in case".
For any example where you should use sparse matrices, a to_dense will break everything. A model with sparse matrices that needs a matrix exponential (or an eigendecomposition or anything that is naturally dense) is just not a realistic case for using a sparse matrix and should throw an error. Either it's a different type or it will basically not work for 90% of people who use it (because then you need to know all of the internals to work out how to use it well rather than just having to accept "I can't do xxxx with a sparse matrix").
On Wed, 28 Aug 2019 at 16:46, Michael Betancourt [email protected] wrote:
There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize
Sure, the question is what is the user experience be if it hadn’t been specialized.
I feel like at some point users will have some corner applications where they need to finish a calculation using dense methods. Given the potential memory blowout from naively going from a sparse matrix to a giant dense matrix it seems prudent to not do this implicitly but an explicit casting function that maps sparse to dense (and potentially even throws with an informative error if the dimensions are too big) might be a good compromise initially.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBV7TPPYHNZANOR5X3DQG3PZHA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MNL6I#issuecomment-525915641, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBXPNDSPRW55Y54MRILQG3PZHANCNFSM4IPODLXA .
Very cool, thank you all for the discussion! My one request for this PR would be to capture a summary of this in the "Rationale and Alternatives" heading.
Another possible point for discussion - I think adding GPU matrix support to the Stan language is almost exactly the same, modulo some degree of difference on the break-even matrix size where converting to a dense [cpu] matrix is so inefficient as to be impossible and should be outlawed. Meaning that in GPU calculation land I think it will actually be somewhat common to want to go back and forth as we flesh out the GPU-aware implementations. So I might propose that we try to treat those the same way, meaning if we go with the full new-types approach we'd have all new cl(_cholesky_factor_corr | _cov | cholesky_factor_cov | ... )_matrix types as well (luckily someone said you'd never want to put a sparse matrix on a GPU or we'd have that combinatoric explosion as well).
The challenge then is designing a spare matrix type, and the accompanying documentation, that clearly communicates those limitations.
At best I think that most users will interpret sparse matrices as “special kinds of matrices that sometimes admit much faster calculations” and not “special kinds of matrices that do not play with the standard matrices”. I understand the technical utility of keeping a sparse type completely encapsulated in its own world, but those not as familiar with sparse linear algebra as yourself will need something to guide them beyond the interpretational baggage they bring with them, for example very careful compiler error messages, extensive documentation, etc.
On Aug 28, 2019, at 4:53 PM, dpsimpson [email protected] wrote:
I can't disagree more.
These aren't incomplete or unimplemented features, these are features that should not exist. For scale, it's like when people say that we should have a Gibbs sampler "just in case".
For any example where you should use sparse matrices, a to_dense will break everything. A model with sparse matrices that needs a matrix exponential (or an eigendecomposition or anything that is naturally dense) is just not a realistic case for using a sparse matrix and should throw an error. Either it's a different type or it will basically not work for 90% of people who use it (because then you need to know all of the internals to work out how to use it well rather than just having to accept "I can't do xxxx with a sparse matrix").
On Wed, 28 Aug 2019 at 16:46, Michael Betancourt [email protected] wrote:
There's no way to compute a sparse matrix exponential and get a sparse matrix out (because maths). So that would be easy to specialize
Sure, the question is what is the user experience be if it hadn’t been specialized.
I feel like at some point users will have some corner applications where they need to finish a calculation using dense methods. Given the potential memory blowout from naively going from a sparse matrix to a giant dense matrix it seems prudent to not do this implicitly but an explicit casting function that maps sparse to dense (and potentially even throws with an informative error if the dimensions are too big) might be a good compromise initially.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBV7TPPYHNZANOR5X3DQG3PZHA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MNL6I#issuecomment-525915641, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBXPNDSPRW55Y54MRILQG3PZHANCNFSM4IPODLXA .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=AALU3FTSHLWCDSTEE6W35X3QG3QVRA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MOBNI#issuecomment-525918389, or mute the thread https://github.com/notifications/unsubscribe-auth/AALU3FUD3Z2XPJMOFQ4HOH3QG3QVRANCNFSM4IPODLXA.
Also curious about the ICAR model - is that a good representative example? It seems to just want to store an adjacency matrix efficiently, but otherwise to not use any sparse computation at all...? I thought someone at StanCon had said we would want to completely disallow element-wise indexing into a sparse matrix, which would appear to disallow this use-case... I am assuming I'm confused here and misheard probably multiple parts of this so please correct me with exuberance.
I understand the technical utility of keeping a sparse type completely encapsulated in its own world, but those not as familiar with sparse linear algebra as yourself will need something to guide them beyond the interpretational baggage they bring with them, for example very careful compiler error messages, extensive documentation, etc.
100% agreed - I still don't really get why you would disallow to_dense in the type system instead of just throwing a runtime warning like we do for deepcopies and other inefficient stuff. But I'm happy to defer to folks who actually write models that need sparse computation. I still dream of a world in which folks don't even need to learn about sparse matrix computations in order to experience their benefit, but perhaps that's a Stan 3 (or 4) thing.
On Aug 28, 2019, at 5:23 PM, seantalts [email protected] wrote:
Very cool, thank you all for the discussion! My one request for this PR would be to capture a summary of this in the "Rationale and Alternatives" heading.
Another possible point for discussion - I think adding GPU matrix support to the Stan language is almost exactly the same, modulo some degree of difference on the break-even matrix size where converting to a dense [cpu] matrix is so inefficient as to be impossible and should be outlawed. Meaning that in GPU calculation land I think it will actually be somewhat common to want to go back and forth as we flesh out the GPU-aware implementations. So I might propose that we try to treat those the same way, meaning if we go with the full new-types approach we'd have all new cl(_cholesky_factor_corr | _cov | cholesky_factor_cov | ... )_matrix types as well (luckily someone said you'd never want to put a sparse matrix on a GPU or we'd have that combinatoric explosion as well).
You mean something like a sparse_cholesky_factor_cov type? We'll need that, too. A problem that'll arise is that it may be possible to write down impossible constraints, like a structural row of zeroes (we can already do that with <lower = 1, upper = -1> on real values).
We might want to think about getting rid of or deprecating the dense correlation and covariance matrices. It's almost always a bad idea to use those. The only place I think they're really necessary is with Wishart priors, which we don't have defined for Cholesky factors. Maybe in multivariate student T, but that should be fixed if we only have full matrix implementations.
On Aug 28, 2019, at 5:30 PM, seantalts [email protected] wrote:
Also curious about the ICAR model - is that a good representative example?
The adjacency matrix is sparse, but it's boolean, not continuous, and not something we want to do linear algebra on.
It seems to just want to store an adjacency matrix efficiently, but otherwise to not use any sparse computation at all...?
Right.
I thought someone at StanCon had said we would want to completely disallow element-wise indexing into a sparse matrix, which would appear to disallow this use-case...
The tricky part is allowing indexing into positions that are structural zeros. Those shouldn't be settable. Structural non-zeros can be settable, but I don't know if that'd ever be useful. Everything should be gettable.
I am assuming I'm confused here and misheard probably multiple parts of this so please correct me with exuberance.
I understand the technical utility of keeping a sparse type completely encapsulated in its own world, but those not as familiar with sparse linear algebra as yourself will need something to guide them beyond the interpretational baggage they bring with them, for example very careful compiler error messages, extensive documentation, etc.
100% agreed - I still don't really get why you would disallow to_dense in the type system instead of just throwing a runtime warning like we do for deepcopies and other inefficient stuff.
I think it's because Dan can't imagine a case where that'd be useful, and it's better not to have something than to have something that never works. I think Dan's got the most experience with these things on the project, so I'd listen to him!
You mean something like a sparse_cholesky_factor_cov type? We'll need that, too. A problem that'll arise is that it may be possible to write down impossible constraints, like a structural row of zeroes (we can already do that with <lower = 1, upper = -1> on real values).
Yeah, I was writing it out for opencl types assuming a cl_ prefix, but yeah it seems like we'd need a bunch of those (all of them?) for sparse types as well. And I was just saying it's good we don't ever want to combine these computational types together, because then we'd really have a ton of types - in that world, a matrix could either be sparse or dense; it could live on the CPU, GPU, or some other hardware; and it can be some more structured form of matrix that has nice computational or statistical properties (cholesky, covariance, both cholesky AND covariance, etc).
I think it's because Dan can't imagine a case where that'd be useful, and it's better not to have something than to have something that never works. I think Dan's got the most experience with these things on the project, so I'd listen to him!
That's what I proposed doing in the next line: "But I'm happy to defer to folks who actually write models that need sparse computation. " :)
Setting aside sparse matrices for a moment, I just wanted to say that, yes, I understand the conceptual difference between vectors and arrays. But when working in practice with Stan, I'm having to clutter my code with to_vector and to_array (or whatever it's called), because some parts of Stan only accept vectors and other parts of Stan only accept real arrays. So I do think that it would make sense for any operation that accepts 1-d real arrays to also accept vectors, and for any operation that accepts 2-d real arrays to also accept matrices, and vice-versa.
On Aug 28, 2019, at 4:09 PM, Michael Betancourt [email protected] wrote:
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I agree — arrays are a general container for any type and vectors and row vectors are linear algebraic objects with well-defined operations one can apply on them. If one defines a type by its admissable operations then the similarity between real[] and vector/row_vector, not to mention matrix[1, N] and matrix[N, 1]!, is only a superficial one.
I think lots of the user confusion comes from people accustomed to R and its dynamic/weak type system. They expect the structural similarity to carry over to functional similarity and get frustrated when it doesn’t. At the very least this is something that should be emphasized very strongly in introductory material so that users are at least aware for the issue.
I understood the type multiplication thing, just missed that cl_ was related to OpenCL.
It's going to get worse with specialized structured sparsity like banded or Toeplitz matrices. Or ones where we definitely need some kind of expression template magic like Kronecker products. We could also conceivably have general triangular matrix types, but I don't know what the use would be for that.
We always get a Cholesky factor with a covariance matrix parameter as it's the unconstrained representation. Are you thinking about deeper connections like caching decompositions?
On Aug 28, 2019, at 5:56 PM, seantalts [email protected] wrote:
You mean something like a sparse_cholesky_factor_cov type? We'll need that, too. A problem that'll arise is that it may be possible to write down impossible constraints, like a structural row of zeroes (we can already do that with <lower = 1, upper = -1> on real values).
Yeah, I was writing it out for opencl types assuming a cl_ prefix, but yeah it seems like we'd need a bunch of those (all of them?) for sparse types as well. And I was just saying it's good we don't ever want to combine these computational types together, because then we'd really have a ton of types - in that world, a matrix could either be sparse or dense; it could live on the CPU, GPU, or some other hardware; and it can be some more structured form of matrix that has nice computational or statistical properties (cholesky, covariance, both cholesky AND covariance, etc).
I think it's because Dan can't imagine a case where that'd be useful, and it's better not to have something than to have something that never works. I think Dan's got the most experience with these things on the project, so I'd listen to him!
That's what I proposed doing in the next line: "But I'm happy to defer to folks who actually write models that need sparse computation. " :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Is there something you can do to a real array you can't do to a vector? The usual problem is that only vector and matrix types support linear algebra.
This is probably not the right location for this as it's the sparse matrix design docs, but while we're here...
The problems for me are more at the level of converting matrix to vector so you can use it on the lhs of a univariate sampling statement, e.g.,
matrix[10, 15] a; a ~ normal(0, 1); // BAD to_vector(a) ~ normal(0, 1); // OK
We didn't want to vectorize over matrices because it gets confusing how to match that to arguments in normal. If the locations are matrices, you wind up needing two to_vector calls.
Or in cases where you want to have an array of vectors for multivariate normal,
matrix[5, 10] b; vector[10] a[5];
a ~ multi_normal(...); // OK b ~ multi_normal(...); // BAD
We decided we didn't want to support that b form of multinormal because of row vs. column confusion.
I also run into problems when I want multivariate priors. The right way to do it for the prior is the following for varying slopes and varying intercepts:
vector[2] ab[50];
ab ~ multi_normal_cholesky(mu, L_Sigma);
But then it's really clunky to use if you need to pull out a vector of slopes or intercepts, which requires to_vector(ab[, 1]) and to_vector(ab[, 2]).
On Aug 28, 2019, at 6:00 PM, Andrew Gelman [email protected] wrote:
Setting aside sparse matrices for a moment, I just wanted to say that, yes, I understand the conceptual difference between vectors and arrays. But when working in practice with Stan, I'm having to clutter my code with to_vector and to_array (or whatever it's called), because some parts of Stan only accept vectors and other parts of Stan only accept real arrays. So I do think that it would make sense for any operation that accepts 1-d real arrays to also accept vectors, and for any operation that accepts 2-d real arrays to also accept matrices, and vice-versa.
On Aug 28, 2019, at 4:09 PM, Michael Betancourt [email protected] wrote:
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I agree — arrays are a general container for any type and vectors and row vectors are linear algebraic objects with well-defined operations one can apply on them. If one defines a type by its admissable operations then the similarity between real[] and vector/row_vector, not to mention matrix[1, N] and matrix[N, 1]!, is only a superficial one.
I think lots of the user confusion comes from people accustomed to R and its dynamic/weak type system. They expect the structural similarity to carry over to functional similarity and get frustrated when it doesn’t. At the very least this is something that should be emphasized very strongly in introductory material so that users are at least aware for the issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Things I can do with a vector or matrix but not a 1-d or 2-d real array:
- vector and matrix operations (multiplication etc)
- various functions that are vectorized. For example, I think you can do theta ~ normal(0,1); if theta is a vector but not if it's a 1-d array. I might be wrong about that particular example, but there are other examples like that.
- various functions for subsetting, concatenation, etc., work for vectors and matrices but not arrays.
The main things I can do with a real array but not a vector or matrix:
- pass it into an ode function
- easily change code from linear to logistic regression. (It's natural for me to write my linear regression data as vector[N] y; but then if I want to switch to logistic regression it becomes int y[N]; and then various other aspects of the code have to be changed cos I'm switching from vector to array.
On Aug 28, 2019, at 9:13 PM, Bob Carpenter <[email protected] mailto:[email protected]> wrote:
Is there something you can do to a real array you can't do to a vector? The usual problem is that only vector and matrix types support linear algebra.
This is probably not the right location for this as it's the sparse matrix design docs, but while we're here...
The problems for me are more at the level of converting matrix to vector so you can use it on the lhs of a univariate sampling statement, e.g.,
matrix[10, 15] a; a ~ normal(0, 1); // BAD to_vector(a) ~ normal(0, 1); // OK
We didn't want to vectorize over matrices because it gets confusing how to match that to arguments in normal. If the locations are matrices, you wind up needing two to_vector calls.
Or in cases where you want to have an array of vectors for multivariate normal,
matrix[5, 10] b; vector[10] a[5];
a ~ multi_normal(...); // OK b ~ multi_normal(...); // BAD
We decided we didn't want to support that b form of multinormal because of row vs. column confusion.
I also run into problems when I want multivariate priors. The right way to do it for the prior is the following for varying slopes and varying intercepts:
vector[2] ab[50];
ab ~ multi_normal_cholesky(mu, L_Sigma);
But then it's really clunky to use if you need to pull out a vector of slopes or intercepts, which requires to_vector(ab[, 1]) and to_vector(ab[, 2]).
On Aug 28, 2019, at 6:00 PM, Andrew Gelman <[email protected] mailto:[email protected]> wrote:
Setting aside sparse matrices for a moment, I just wanted to say that, yes, I understand the conceptual difference between vectors and arrays. But when working in practice with Stan, I'm having to clutter my code with to_vector and to_array (or whatever it's called), because some parts of Stan only accept vectors and other parts of Stan only accept real arrays. So I do think that it would make sense for any operation that accepts 1-d real arrays to also accept vectors, and for any operation that accepts 2-d real arrays to also accept matrices, and vice-versa.
On Aug 28, 2019, at 4:09 PM, Michael Betancourt <[email protected] mailto:[email protected]> wrote:
I don't think of arrays and vectors as the same mathematical object any more than I think of a complex number as a pair of real numbers, despite their being the same abstract data type in terms of data.
I agree — arrays are a general container for any type and vectors and row vectors are linear algebraic objects with well-defined operations one can apply on them. If one defines a type by its admissable operations then the similarity between real[] and vector/row_vector, not to mention matrix[1, N] and matrix[N, 1]!, is only a superficial one.
I think lots of the user confusion comes from people accustomed to R and its dynamic/weak type system. They expect the structural similarity to carry over to functional similarity and get frustrated when it doesn’t. At the very least this is something that should be emphasized very strongly in introductory material so that users are at least aware for the issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=AAZYCUMZCVU2E46U4TFJK7LQG4PB3A5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5M5HXQ#issuecomment-525980638, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZYCUMGLDQB2HWTZ3BCWPTQG4PB3ANCNFSM4IPODLXA.
I don’t think we should do this. Huge memory waste limits it to very small matrices. R/Python have easy to access sparse matrix support
On Thu, Aug 29, 2019 at 03:54 Rok Češnovar [email protected] wrote:
@rok-cesnovar commented on this pull request.
In designs/0004-sparse-matrices.md https://github.com/stan-dev/design-docs/pull/8#discussion_r318931714:
+## Helper Functions
+We can also include helper functions to extract the sparsity pattern's row and column information. + +
stan +int K = num_nz_elements(x); +// Extract effectively a tuple representation of the sparse matrix. +matrix[K, 3] = get_nz_elements(x); ++ +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## I/O + +Data can be read in from json via a list of lists and the Rdump from the equivalent.Should we also support that the input is in a dense matrix form in the input file but then the parser & data handler automatically constructs the sparse matrix, ignoring zeroes? We would need to somehow inform the parser to ignore zeroes during parsing, otherwise we would first create the dense matrix and then transform which defeats the purpose of introducing sparsity..
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBWSAUHGVDG6BSMBYSLQG56C7A5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCDB7KCI#pullrequestreview-281277705, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBVDW2CB2RINAAZPNTLQG56C7ANCNFSM4IPODLXA .
On Aug 28, 2019, at 9:56 PM, Andrew Gelman [email protected] wrote:
Things I can do with a vector or matrix but not a 1-d or 2-d real array:
The question was whether the problem exists the other way around. Is there something you can do with a real[] that you can't do with a vector or row_vector?
- vector and matrix operations (multiplication etc)
That's intentional. It doesn't make sense to do matrix operations on arrays. You have to guess about orientation like R and it's a mess.
- various functions that are vectorized. For example, I think you can do theta ~ normal(0,1); if theta is a vector but not if it's a 1-d array.
Yes, that works if it's a 1D array. But theta ~ multi_normal(mu, Sigma) doesn't work if mu or theta are real[] types---they have to be vectors. I did it that way because they're considered as a whole in multi_normal. It wouldn't kill me to loosen this.
I might be wrong about that particular example, but there are other examples like that.
If you can ping this thread (or just me) when one comes up, that'd be great.
- various functions for subsetting, concatenation, etc., work for vectors and matrices but not arrays.
I'm not sure what you mean here. All the slicing works the same way.
If what you mean is that you can't concatenate a real[] and vector, that's by design. What would the result be?
The main things I can do with a real array but not a vector or matrix:
- pass it into an ode function
This is a real limitation. But this is the simplest possible to_array or to_vector conversion, right? We could generalize this, but I think we'll be going with an even better solution based on closures.
These higher-order functions are special in that they're implemented very differently. We could generalize the signatures without much effort if you really find to_vector to be that onerous. But it'd have to wait until stanc3 gets rolled out as work on the old parser is frozen pending that release.
- easily change code from linear to logistic regression.
That doesn't seem like it should be so easy. The likelihood changes, you need to add a link function, and there's no more scale parameter. The constraints on y also change to be lower = 0 and upper = 1 from unconstrained.
(It's natural for me to write my linear regression data as vector[N] y; but then if I want to switch to logistic regression it becomes int y[N]; and then various other aspects of the code have to be changed cos I'm switching from vector to array.
This is because Stan distinguishes integer and real types (again, so it doesn't have to guess like R). I'm strongly opposed to having integer vector types because we're not doing number theory.
As a programmer, I like that the types reflect the underlying data type rather than being implicit.
Okay, to try to corral some of the conversation back, it seems like folks basically want to go with what our potential sparse matrix users would like to see in Stan's type system here, and so far the only person like that we have commenting is @dpsimpson. Dan's proposal is that we should have a separate type for every kind of matrix that could also be sparse and that we would only allow efficient operations to be allowed on them in the type system. I think we want to add some clarification for which types and operations those actually are concretely. Is it enough to add a single sparse_matrix type to Stan with the operations listed here? That's what's in this proposal now, just double checking that's still kosher.
If so, I think we need to change the design doc to add a different model as an exemplar model; the ICAR model would not benefit from sparse matrices as described in this proposal as we wouldn't be allowed to do element-wise access. Does anyone have another model that would benefit from the design in this proposal they can contribute?
Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.
No, the ICAR isn't a good fit. Sparse GPs would be where I think the plan's to add sparse log determinant and sparse Cholesky factorization (on precision matrices, I think---you'd need to ask Dan!).
You an also think smaller in terms of a big regression problem where the data matrix is sparse and the coefficient vector is dense. So we definitely need some mixed operations.
On Aug 31, 2019, at 7:40 PM, seantalts [email protected] wrote:
Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Sparse GPs are a different thing that don’t involve sparse matrices. (I know!) So they aren’t a use case.
Multivariate Gaussians with sparse precision matrices is the main use case.
On Sat, Aug 31, 2019 at 20:28 Bob Carpenter [email protected] wrote:
No, the ICAR isn't a good fit. Sparse GPs would be where I think the plan's to add sparse log determinant and sparse Cholesky factorization (on precision matrices, I think---you'd need to ask Dan!).
You an also think smaller in terms of a big regression problem where the data matrix is sparse and the coefficient vector is dense. So we definitely need some mixed operations.
On Aug 31, 2019, at 7:40 PM, seantalts [email protected] wrote:
Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBX6I3WMD4VEN77NQCTQHMEEJA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TXWVI#issuecomment-526875477, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBUSTOWPUCA5TN4IE5LQHMEEJANCNFSM4IPODLXA .
A good use case along the lines of the ICAR that needs full linear algebra is this model https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html
(Note the case study doesn’t use the sparsity to do anything, so it’s just
the model that’s relevant.). In this case you need the log determinant if a
matrix with var values. You probably need to use a non-centred
parameterization.
On Sat, Aug 31, 2019 at 20:30 Daniel Simpson [email protected] wrote:
Sparse GPs are a different thing that don’t involve sparse matrices. (I know!) So they aren’t a use case.
Multivariate Gaussians with sparse precision matrices is the main use case.
On Sat, Aug 31, 2019 at 20:28 Bob Carpenter [email protected] wrote:
No, the ICAR isn't a good fit. Sparse GPs would be where I think the plan's to add sparse log determinant and sparse Cholesky factorization (on precision matrices, I think---you'd need to ask Dan!).
You an also think smaller in terms of a big regression problem where the data matrix is sparse and the coefficient vector is dense. So we definitely need some mixed operations.
On Aug 31, 2019, at 7:40 PM, seantalts [email protected] wrote:
Or I may be wrong about ICAR being a bad fit - if there's a more efficient version possible once we have sparse matrices, relevant lines from that would be good to include in this design doc to show the before-and-after.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/pull/8?email_source=notifications&email_token=ADRICBX6I3WMD4VEN77NQCTQHMEEJA5CNFSM4IPODLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TXWVI#issuecomment-526875477, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRICBUSTOWPUCA5TN4IE5LQHMEEJANCNFSM4IPODLXA .
What's the application of multivariate Gaussians with sparse precision matrices? I had wrongly supposed it was GPs.
Setting aside generic sparsity, aren't there some GPs where the covariance matrix is structured like banded or something like that?
I thought the full CAR model was a GP. That's the kind of model where I thought we'd need the sparse log determinant.
On Aug 31, 2019, at 9:01 PM, dpsimpson [email protected] wrote:
A good use case along the lines of the ICAR that needs full linear algebra is this model https://mc-stan.org/users/documentation/case-studies/mbjoseph-CARStan.html