cmdstan icon indicating copy to clipboard operation
cmdstan copied to clipboard

Generate Quantities segfaults for zero-length parameter array

Open jmshoun opened this issue 3 years ago • 3 comments

Summary:

generate_quantities throws a segfault when the model includes zero-length parameter arrays.

Description:

I've seen an idiom suggested in the Stan forums for including optional parameters in Stan models, where the optional parameter is sized as an array of length 0 or 1 depending on whether it is to be included. This idiom works very well for sampling, but CmdStan segfaults whenever generate_quantities is run on a model with a zero-length parameter.

Reproducible Steps:

The following is a small reproducible example of the issue. There are a few obvious ways this Stan program could be rewritten to avoid this issue, but there are other cases where the rewrite is quite a bit less obvious. Sample program (opt-param.stan):

data {
    int<lower=1> N;
    real x[N];
    real y[N];
    int<lower=0, upper=1> use_intercept;
}
parameters {
    real slope;
    real intercept[use_intercept];
    real<lower=0> sigma;
}
transformed parameters {
    real y_hat[N];
    for (n in 1:N) {
        y_hat[n] = slope * x[n];
        if (use_intercept) y_hat[n] += intercept[1];
    }
}
model {
    slope ~ normal(0, 1);
    intercept ~ normal(0, 1);
    sigma ~ cauchy(0, 1);
    y ~ normal(y_hat, sigma);
}
generated quantities {
    real log_lik[N];
    for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | y_hat[n], sigma);
}

Sample data (opt-data.json):

{
    "N": 11,
    "x": [10.0, 8.0, 13.0, 9.0, 11.0, 14.0, 6.0, 4.0, 12.0, 7.0, 5.0],
    "y": [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68],
    "use_intercept": 0
}

Reproducible steps:

make opt-param
./opt-param sample data file=opt-data.json output file=opt-fit.csv
./opt-param generate_quantities fitted_params=opt-fit.csv data file=opt-data.json output file=gq.csv

Current Output:

The last command above yields a runtime error: [1] 91352 segmentation fault ./opt-param generate_quantities fitted_params=opt-fit.csv data output

Expected Output:

I expected the generated quantities to be output as usual. In the example above, everything works perfectly if use_intercept is set to 1 in the data file.

Additional Information:

I'm running macOS 10.15.7.

Current Version:

v2.26.1

jmshoun avatar Mar 10 '21 14:03 jmshoun

thanks for the clear and detailed report. sounds like a bug. will investigate.

mitzimorris avatar Mar 10 '21 16:03 mitzimorris

@mitzimorris, confirmed this is a bug. I tracked it down and it needs to be fixed in the stan-dev/stan repo. I'll come up with a minimal example and show what's going on. I think one way to solve it is to introduce a function that's generated by stanc3, but there may be other ways to address the problem.

syclik avatar Jun 14 '21 16:06 syclik

standalone GQ currently only checks that the total number of parameter in the model matches the total number of parameters in the supplied fitted_params file. not sure what it would do if you have two container parameters with dimensions controlled by data inputs, e.g., variables I, J, K, which specify vector lengths, and fitted_params was the result of parameters fitted to data where I=4, J=5, K=6, and the supplied model data was I=3, J=4, K=8.

conclusion: standalone GQ shoudl check on parameter dimensions as well. parameter dimensions on the input can be parsed out of the header. if the Stanc3 compiler can add a function that reports the parameter dimensions on the instatiated model, this would allow for better and more robust input handling.

mitzimorris avatar Jun 14 '21 17:06 mitzimorris

This was resolved by https://github.com/stan-dev/stan/pull/3179

WardBrian avatar Aug 22 '23 13:08 WardBrian