design-docs Comments on Matrix Autodiff Type

A few comments for consideration in the matrix autodiff design document.

As much as the matrix of vars enables better performance it also greatly facilities the implementation matrix-valued reverse mode updates, as demonstrated in the matrix product example. The resulting code is not only easier to develop but also easier to read -- all in addition to likely being much faster. Perhaps worthy of mentioning in the summary.
Given the amount of compiler logic proposed it would be great to have a compiler option that returns an annotated Stan program indicating with variables were assigned matrix<var> and which were assigned var<matrix>. This would help users track down undesired casts down to matrix<var> and optimize their code without having to guess at the underlying compiler logic.
It would also be great to have some way to force a conversion from a matrix<var> to a var<matrix>, even if only with a function meant for advanced users. I'm thinking in particular of Gaussian process applications where custom Gram matrices are built element by element in a user-defined function, but afterwards are used used as only monolithic matrices. My understanding of the proposed logic is that this would propagate a matrix<var> throughout the Stan program and miss out on potential performance unless there is some way to force a user-defined function to return a var<matrix> to cast the return, for example

functions {
  matrix custom_covar(...);
}
...
model {
  matrix[N, N] custom_gram = to_varmat(custom_covar(...));
}

Jan 05 '21 20:01 betanalpha

Given the amount of compiler logic proposed it would be great to have a compiler option that returns an annotated Stan program indicating with variables were assigned matrix and which were assigned var. This would help users track down undesired casts down to matrix and optimize their code without having to guess at the underlying compiler logic.

A user could do this by printing the transformed MIR from stanc3. bin/stanc --debug-transformed-mir model.stan or --debug-transformed-mir-pretty

var<matrix> parameters will have a _varmat__ suffix, which will annotate internally that the parameter C++ code should be generated differently. Is that enough or should there be more "user-friendly"?

My understanding of the proposed logic is that this would propagate a matrix throughout the Stan program and miss out on potential performance unless there is some way to force a user-defined function to return a var to cast the return, for example

Your understanding is correct, it would propagate as matrix<var>. Such opportunities are missed because we did not want to get into the compiler analyzing potential opportunities where doing a to_varmat might actually be worth it because its much harder to validate that there will be no performance regressions for current models. I like the to_varmat call honestly, though it should come with a warning that the C++ might not compile.

Jan 05 '21 21:01 rok-cesnovar

I like the to_varmat call honestly, though it should come with a warning that the C++ might not compile.

Yeah I'd be fine with having that exposed

Jan 05 '21 21:01 SteveBronder

var parameters will have a varmat_ suffix, which will annotate internally that the parameter C++ code should be generated differently. Is that enough or should there be more "user-friendly"?

I think that would be friendly enough for the more advanced users who would be digging into these optimizations, in which case it's more of a documentation concern. A good opportunity for a nice section in the manual, users' guide, or even Discourse about effective use of Stan's linear algebra library.

Such opportunities are missed because we did not want to get into the compiler analyzing potential opportunities where doing a to_varmat might actually be worth it because its much harder to validate that there will be no performance regressions for current models.

I totally understand why this would be hard for the compiler logic (although a great example of why Stan3 sets the stage for all kinds of potential sophisticated optimizations going forwards). My only concern is that there are likely to be enough exceptions where advanced users would know safe places to cast that the functionality would be beneficial.

I like the to_varmat call honestly, though it should come with a warning that the C++ might not compile.

The more warnings the better!

Jan 05 '21 21:01 betanalpha

I do not think we should add a to_varmat() function to the language, primarily because it's using the arbitrary name given to a class in Stan math rather than something in the user's language.

I would much prefer to add a notion of const variable to Stan. So that'd be something like:

const matrix[M, N] Sigma = my_funky_gm_covariance(...);

Then it wopuld be illegal to subsequently reassign elements Sigma[m, n]. This is more restrictive than we need---there's no problem with reassigning all of Sigma, it's just the elements that are problematic. But I think that'll be OK. It'd also support meaningful error messages for users trying to reassign.

I'd even be OK with const_matrix as a unified type if we don't want to do this for other types, but we probably want to allow anything to be declared const if we're going to allow const.

From the compiler perspective, any matrix that is never assigned to again can be assigned a const type. This is just like const in C++. I don't see much room for useful user choice here. I guess for a matrix that doesn't ever get used for arithmetic or linear algebra, it would be wasteful to eagerly reassign to varmat.

Jan 05 '21 21:01 bob-carpenter

I do not think we should add a to_varmat() function to the language, primarily because it's using the arbitrary name given to a class in Stan math rather than something in the user's language.

I agree —I was imagining to_varmat() as just a placeholder for something more interpretable to users, or at least advanced users.

I would much prefer to add a notion of const variable to Stan. So that'd be something like:

const matrix[M, N] Sigma = my_funky_gm_covariance(...); Then it wopuld be illegal to subsequently reassign elements Sigma[m, n]. This is more restrictive than we need---there's no problem with reassigning all of Sigma, it's just the elements that are problematic. But I think that'll be OK. It'd also support meaningful error messages for users trying to reassign.

I'd even be OK with const_matrix as a unified type if we don't want to do this for other types, but we probably want to allow anything to be declared const if we're going to allow const.

From the compiler perspective, any matrix that is never assigned to again can be assigned a const type. This is just like const in C++. I don't see much room for useful user choice here. I guess for a matrix that doesn't ever get used for arithmetic or linear algebra, it would be wasteful to eagerly reassign to varmat.

The design doc discusses reasons against adding a new type or type annotation to the language and I think that they’re reasonable.

Really this is something more like a compiler hint or a specifier in the C++ parlance that lets the compiler know that the user thinks that it's supposed to be safe to be more aggressive in using varmat. Maybe something like

matrix[N, N] Sigma = my_funky_gm_covariance(…) encase;

that would parse only for inline matrix declarations .

Jan 05 '21 22:01 betanalpha

I'd be fine with adding a new type to Stan for these, but tbh the biggest thing is what the gosh dang to name these! It's really just moving from an array of structs to a struct of arrays. const is okay and the sparse matrix prototype in the Stan compiler has the doo-dads to figure out when a const matrix is being assigned to illegally. Though I kind of worry with a const keyword that people will start writing their programs in weird ways so that they can trick the compiler to give them the new matrix type which is why I'd rather keep it 99% a thing the compiler just does and then have some weird esoteric function/keyword for advanced users.

I kind of like bobs suggestion of encase, though it would be nice to avoid a new keyword if possible. It might be easier on the compiler side to have some function that, when the compiler sees it, it knows to force the use of the new matrix type

Jan 05 '21 22:01 SteveBronder

I'd be fine with adding a new type to Stan for these, but tbh the biggest thing is what the gosh dang to name these! It's really just moving from an array of structs to a struct of arrays. const is okay and the sparse matrix prototype in the Stan compiler has the doo-dads to figure out when a const matrix is being assigned to illegally. Though I kind of worry with a const keyword that people will start writing their programs in weird ways so that they can trick the compiler to give them the new matrix type which is why

I strongly agree. Even mathematically the difference between a matrix as a representation of a linear transformation and a matrix as a collection of numbers is subtle, and there’s no name that unambiguously denotes the difference. const isn’t write because the matrix itself isn’t immutable, it just can’t be changed element by element. encase, encapsulate, and the like capture the fact that varmat encapsulates all of the data together but those words also have other connotations that can confuse users.

In some sense the most precise thing to do would be to not allow element wise access of matrix types at all, and instead require users to cast to and from arrays when necessary; then matrix types are the efficient linear algebra types and arrays are the collections of numbers. This would, however, require nearly everyone to rewrite all of their Stan programs.

I'd rather keep it 99% a thing the compiler just does and then have some weird esoteric function/keyword for advanced users.

If only because of the naming problems I am very okay with this approach. In particular if this functionality can be kept as an “advanced” feature then I would hope that we wouldn’t have to be too precious about the naming since it would require users to know that underlying C++ context already. Maybe even something somewhat ambiguous like “to_fast_matrix()” that avoids explicit references to the C++ but would be easy to look up in the docs.

Jan 05 '21 23:01 betanalpha

Let me flip the table, what if we gave it a bad name on purpose so it looks smelly? I kind of like exposing to_var_matrix() then we have in the docs something like

soa_matrix to_var_matrix(aos_matrix) soa_matrix to_var_matrix(soa_matrix)

WARNING: This function is intended only for advanced users.

Takes in the standard Stan matrix type as an array of structs (aos) and forces the compiler to convert it to the the struct of arrays (aos) format (Aka from an Eigen::Matrix<var, R, C> to an var_value<Eigen::Matrix<double, R, C>>. This function can cause the Stan compiler to produce code that will not compile. This function has O(N^2) cost.

I like that because it sounds kind of scary / technical which should hopefully dissuade anyone that doesn't know what they are doing from using it. Though the only bad thing about not having a keyword is that this can't be used to make parameters default to var<Matrix>

Jan 06 '21 00:01 SteveBronder

If we don’t introduce yet another container type for users to choose between (varmat, matvar, and array) then I think this is the best way to allow advanced users to get around the limited compiler logic in the few exceptional patterns where it might make a substantial performance difference. Maybe even putting the function in its own reference guide section like “Compiler Helper Functions” so that users looking through the matrix functions don’t find it by accident.

Note the cost would be O(R * C).

On Jan 5, 2021, at 7:57 PM, Steve Bronder [email protected] wrote:

Let me flip the table, what if we gave it a bad name on purpose so it looks smelly? I kind of like exposing to_var_matrix() then we have in the docs something like

soa_matrix to_var_matrix(aos_matrix) soa_matrix to_var_matrix(soa_matrix)

WARNING: This function is intended only for advanced users.

Takes in the standard Stan matrix type as an array of structs (aos) and forces the compiler to convert it to the the struct of arrays (aos) format (Aka from an Eigen::Matrix<var, R, C> to an var_value<Eigen::Matrix<double, R, C>>. This function can cause the Stan compiler to produce code that will not compile. This function has O(N^2) cost.

I like that because it sounds kind of scary / technical which should hopefully dissuade anyone that doesn't know what they are doing from using it. Though the only bad thing about not having a keyword is that this can't be used to make parameters default to var<Matrix>

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/design-docs/issues/32#issuecomment-754995356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALU3FXBP52NL5PIZ7BOTUDSYOYN5ANCNFSM4VWKPHJA.

Jan 06 '21 03:01 betanalpha

I kind of like exposing to_var_matrix()

This is in developer language, not user language. We could change the name, but I think having something that looks more like a compiler hint as @betanalpha is suggesting without adding a new type is the way to go.

I kind of like bobs suggestion of encase

I don't remember making that suggestion. I prefer "immutable" as the term for the new matrix structures. That's because we can allow holistic assignment, just not element-level assignment. I imagine something like this.

immutable matrix[M, M] a;
vector[M] v;
a = diag_matrix(v);  // LEGAL: assign a matrix to an immutable matrix
a[i, j] = y; // ILLEGAL: assign an element of an immutable matrix

This is a constraint on how a variable is used. It does not introduce a new type of variable.

What's the state of functions accepting matrices? Have they all been updated to allow var_mat? If so, then declaring immutable guarantees that var_mat is an acceptable target. Otherwise, it's just a compiler hint.

Jan 06 '21 15:01 bob-carpenter

That's because we can allow holistic assignment, just not element-level assignment.

Just to clarify, we do allow element-level assignment to varmat as well. Its just a lot slower (2-4 times was the estimate from a benchmark @bbbales2 made).

Its non trivial for the compiler to "guess" when varmat would actually still be faster for a given expression even if it contains some element-level accesses which is why we resorted to a conservative approach to back-fall to matvar in these cases because that guarantees the model will at worst have the same performance as it had in the last release.

For example

matrix[N, N] a;
matrix[N, N] b;

matrix[N, N] c = a*b;

a[10,10] = a[10,10] + 2.0;

would definitely be faster with varmat, but as a has element-level access, the compiler will make it a matvar.

We have not figured out any simple and clear set of rules that would guarantee no performance regression and would also be easy to test. I am personally scared of any sort of heuristic rules in the compiler, but I am sure there are some optimization strategies that someone with more knowledge of compiler design can figure out in the future.

Have they all been updated to allow var_mat?

Not all of them, but a lot of them have. The compiler implementation (on a branch, nothing is on master yet) uses a list of varmat-supported functions to figure out when varmat can be used.

Jan 06 '21 16:01 rok-cesnovar

Another idea that does come to mind is that we would have a --varmat flag (name not final :)) for the compiler that would use var<matrix> anywhere legally possible and only convert to matrix<var> when there is no other option (like for a function that does not support var<matrix>.

This would work and be faster under the assumption that the user writes the model using vectorized functions wherever possible.

My hunch is that once all or almost all functions support var<matrix> this will always be faster then any conservative approach apart from models written inefficiently. There is just no way of proving that (EDIT: by that I mean I dont have a way of proving that) which is why I think this mode can not be the default one.

Jan 06 '21 16:01 rok-cesnovar

Just to clarify, we do allow element-level assignment to varmat as well. Its just a lot slower (2-4 times was the estimate from a benchmark @bbbales2 made).

Thanks for clarifying. But why allow that? It seems like it'd be a huge hassle on the math lib side and be really inefficient in the case of either a lot of assignments or big matrices, depending on how it's implemented.

Is there a link to the benchmark somewhere? Where do I find the implementation of var that allows assignment?

I somehow missed most of this discussion as it was going down.

element-level access

Access is fine for getting values, just not for assigning.

once all or almost all functions support var this will always be faster then any conservative approach

How do we build up a covariance matrix for GPs if the only type we have is var? I would think we'd need something like comprehensions.

apart from models written inefficiently.

The way to make that precise would be to claim that the most efficient way to write any model is using var<mat>. That's why I'm asking about GPs.

Jan 06 '21 17:01 bob-carpenter

But why allow that? It seems like it'd be a huge hassle on the math lib side and be really inefficient in the case of either a lot of assignments or big matrices, depending on how it's implemented.

By "we allow it", I mean it can be done in Stan Math. The current conservatively-envisioned compiler implementation would not allow it in Stan. For benchmarks see https://github.com/stan-dev/design-docs/pull/29#issuecomment-745587758 and two posts below that. For row and column-wise see https://github.com/stan-dev/design-docs/pull/30#discussion_r549813900 (those are faster for varmat).

For where to find all the parts of that implementation I am going to defer to @t4c1 @SteveBronder and @bbbales2. I would just point you to the wrong place :)

Jan 06 '21 17:01 rok-cesnovar

Here's the benchmark of reading or writing scalars: link. There's another one that reads + writes below it.

As an example of why accessing subsets is handy, here are benchmarks of similar operations on rows/columns: link

This is the implementation (could be other versions of this elsewhere, but this is one place where this is done).

The routine is, on assignment to a varmat, add a vari to the stack that zeroes the adjoints and undoes the assignment in the reverse pass. I think this changes the behavior of doing two reverse passes without a zero, but I don't know of a reason to do that.

How do we build up a covariance matrix for GPs if the only type we have is var?

Slowly

I would think we'd need something like comprehensions.

The latest on that conversation is the question here and the response here.

Edit: The short answer on comprehensions is that varmat ended up being useful without them and I'm not actually sure how to do them in a way that would be much more efficient than the way we build GPs in Stan code now -- there's still be tons of varis everywhere.

Jan 06 '21 17:01 bbbales2

Relevant to this discussion, we're postponing the varmat release until 2.27 (check here)

Jan 13 '21 18:01 bbbales2

design-docs design-docs copied to clipboard

Comments on Matrix Autodiff Type

design-docs
design-docs copied to clipboard