cmdstan
cmdstan copied to clipboard
Generate Quantities segfaults for zero-length parameter array
Summary:
generate_quantities
throws a segfault when the model includes zero-length parameter arrays.
Description:
I've seen an idiom suggested in the Stan forums for including optional parameters in Stan models, where the optional parameter is sized as an array of length 0 or 1 depending on whether it is to be included. This idiom works very well for sampling, but CmdStan segfaults whenever generate_quantities
is run on a model with a zero-length parameter.
Reproducible Steps:
The following is a small reproducible example of the issue. There are a few obvious ways this Stan program could be rewritten to avoid this issue, but there are other cases where the rewrite is quite a bit less obvious.
Sample program (opt-param.stan
):
data {
int<lower=1> N;
real x[N];
real y[N];
int<lower=0, upper=1> use_intercept;
}
parameters {
real slope;
real intercept[use_intercept];
real<lower=0> sigma;
}
transformed parameters {
real y_hat[N];
for (n in 1:N) {
y_hat[n] = slope * x[n];
if (use_intercept) y_hat[n] += intercept[1];
}
}
model {
slope ~ normal(0, 1);
intercept ~ normal(0, 1);
sigma ~ cauchy(0, 1);
y ~ normal(y_hat, sigma);
}
generated quantities {
real log_lik[N];
for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | y_hat[n], sigma);
}
Sample data (opt-data.json
):
{
"N": 11,
"x": [10.0, 8.0, 13.0, 9.0, 11.0, 14.0, 6.0, 4.0, 12.0, 7.0, 5.0],
"y": [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68],
"use_intercept": 0
}
Reproducible steps:
make opt-param
./opt-param sample data file=opt-data.json output file=opt-fit.csv
./opt-param generate_quantities fitted_params=opt-fit.csv data file=opt-data.json output file=gq.csv
Current Output:
The last command above yields a runtime error:
[1] 91352 segmentation fault ./opt-param generate_quantities fitted_params=opt-fit.csv data output
Expected Output:
I expected the generated quantities to be output as usual. In the example above, everything works perfectly if use_intercept
is set to 1
in the data file.
Additional Information:
I'm running macOS 10.15.7.
Current Version:
v2.26.1
thanks for the clear and detailed report. sounds like a bug. will investigate.
@mitzimorris, confirmed this is a bug. I tracked it down and it needs to be fixed in the stan-dev/stan repo. I'll come up with a minimal example and show what's going on. I think one way to solve it is to introduce a function that's generated by stanc3, but there may be other ways to address the problem.
standalone GQ currently only checks that the total number of parameter in the model matches the total number of parameters in the supplied fitted_params file. not sure what it would do if you have two container parameters with dimensions controlled by data inputs, e.g., variables I, J, K, which specify vector lengths, and fitted_params was the result of parameters fitted to data where I=4, J=5, K=6, and the supplied model data was I=3, J=4, K=8.
conclusion: standalone GQ shoudl check on parameter dimensions as well. parameter dimensions on the input can be parsed out of the header. if the Stanc3 compiler can add a function that reports the parameter dimensions on the instatiated model, this would allow for better and more robust input handling.
This was resolved by https://github.com/stan-dev/stan/pull/3179