cmdstanr Allow structured format for unconstrained parameters

The current approach to providing unconstrained parameters as arguments (as inherited from rstan) requires that they are passed as a single numeric vector with all unconstrained parameter values concatenated.

After some API discussions with @n-kall, it was pointed out that this approach can easily lead to errors if any transformations or custom constructions of unconstrained parameters are performed - as users might accidentally provide parameter values in the wrong order.

Instead, we could allow unconstrained parameters to be passed as a named list like we do for constrained parameters, but with handling/checking for the different number of unconstrained values (e.g., for simplexes).

As a concrete example, for a model with parameters:

parameters {
  real x;
  simplex[4] y;
}

The unconstrained parameters would need to be passed as a four-element numeric vector:

unconstrained_vector <- c(1.2, 0.6, 1.6, -1.2)

This clearly allows for the easy mistake of accidentally providing the value of x as the last, rather than the first element (especially if the model is being iterated/revised).

I'm proposing we could accept the unconstrained parameters in the format:

unconstrained_list <- list(x = 1.2, y = c(0.6, 1.6, -1.2))

And then handle the ordering internally when mapping to a single vector of values.

Thoughts?

Dec 19 '22 10:12 andrjohns

Love it!

Dec 19 '22 11:12 rok-cesnovar

Great idea!

Dec 19 '22 19:12 jgabry

why is this needed in cmdstanr? cmdstan only deals with inputs on the constrained scale.

Dec 20 '22 15:12 mitzimorris

The model methods functions exposed to R via c++ (log_prob, grad_log_prob, hessian) take their inputs on the unconstrained scale

Dec 20 '22 17:12 andrjohns

this proposal only goes halfway towards validating the inputs - it provides names for the variables, but not structure. in particular, matrix and multi-dim array objects are flattened into a vector and there is no way of checking that the flattened structure is serialized correctly - i.e., column-major.

a meta-objection is that this feature is unecessary. the output from the diagnose method provides the correctly ordered vector of parameters on the unconstrained scale. is it really worth the cost of implementing, testing, and maintaining this feature - every feature added is maintenance debt going forwards.

Dec 22 '22 14:12 mitzimorris

this proposal only goes halfway towards validating the inputs - it provides names for the variables, but not structure

The structure is the point of this proposal. Similar to how providing initial values or data requires a named list with structured inputs, this proposes allowing the same for unconstrained parameters

there is no way of checking that the flattened structure is serialized correctly - i.e., column-major.

This is also one of the reasons why this proposal would be beneficial, as users currently have to perform this serialisation themselves. By handling this internally, and including validation checks, we can easily remove a source of user error.

a meta-objection is that this feature is unecessary

We've already had a request for it. Additionally, it's a very simple implementation:

Check parameter dimensions
Reorder parameters if needed
Call unlist()

Imo, a trivial amount of work to improve user experience

Dec 22 '22 17:12 andrjohns

perhaps I misunderstood the proposal - what does the input for a matrix look like?

also, no feature is a trivial amount of work - in order to avoid future maintenance headaches, it must be thoroughly documented and tested.

Dec 22 '22 17:12 mitzimorris

what does the input for a matrix look like?

The same as for data and initial values (either an R matrix or dataframe), we then handle the serialisation internally

Dec 22 '22 17:12 andrjohns

cmdstanr cmdstanr copied to clipboard

Allow structured format for unconstrained parameters

cmdstanr
cmdstanr copied to clipboard