cmdstanr
cmdstanr copied to clipboard
Allow structured format for unconstrained parameters
The current approach to providing unconstrained parameters as arguments (as inherited from rstan
) requires that they are passed as a single numeric vector with all unconstrained parameter values concatenated.
After some API discussions with @n-kall, it was pointed out that this approach can easily lead to errors if any transformations or custom constructions of unconstrained parameters are performed - as users might accidentally provide parameter values in the wrong order.
Instead, we could allow unconstrained parameters to be passed as a named list like we do for constrained parameters, but with handling/checking for the different number of unconstrained values (e.g., for simplexes).
As a concrete example, for a model with parameters:
parameters {
real x;
simplex[4] y;
}
The unconstrained parameters would need to be passed as a four-element numeric vector:
unconstrained_vector <- c(1.2, 0.6, 1.6, -1.2)
This clearly allows for the easy mistake of accidentally providing the value of x
as the last, rather than the first element (especially if the model is being iterated/revised).
I'm proposing we could accept the unconstrained parameters in the format:
unconstrained_list <- list(x = 1.2, y = c(0.6, 1.6, -1.2))
And then handle the ordering internally when mapping to a single vector of values.
Thoughts?
Love it!
Great idea!
why is this needed in cmdstanr? cmdstan only deals with inputs on the constrained scale.
The model methods functions exposed to R via c++ (log_prob
, grad_log_prob
, hessian
) take their inputs on the unconstrained scale
this proposal only goes halfway towards validating the inputs - it provides names for the variables, but not structure. in particular, matrix and multi-dim array objects are flattened into a vector and there is no way of checking that the flattened structure is serialized correctly - i.e., column-major.
a meta-objection is that this feature is unecessary. the output from the diagnose
method provides the correctly ordered vector of parameters on the unconstrained scale. is it really worth the cost of implementing, testing, and maintaining this feature - every feature added is maintenance debt going forwards.
this proposal only goes halfway towards validating the inputs - it provides names for the variables, but not structure
The structure is the point of this proposal. Similar to how providing initial values or data requires a named list with structured inputs, this proposes allowing the same for unconstrained parameters
there is no way of checking that the flattened structure is serialized correctly - i.e., column-major.
This is also one of the reasons why this proposal would be beneficial, as users currently have to perform this serialisation themselves. By handling this internally, and including validation checks, we can easily remove a source of user error.
a meta-objection is that this feature is unecessary
We've already had a request for it. Additionally, it's a very simple implementation:
- Check parameter dimensions
- Reorder parameters if needed
- Call
unlist()
Imo, a trivial amount of work to improve user experience
perhaps I misunderstood the proposal - what does the input for a matrix look like?
also, no feature is a trivial amount of work - in order to avoid future maintenance headaches, it must be thoroughly documented and tested.
what does the input for a matrix look like?
The same as for data and initial values (either an R matrix or dataframe), we then handle the serialisation internally