stan
stan copied to clipboard
parameter bounds far from optimal solution still affect optimization
Summary:
In a simple linear regression model, I'm feeding in data for which the optimal value of parameter alpha is 0.5. If I don't put bounds on the parameter, the 'optimizing' method converges to the correct solution. If I bound alpha with <lower=-100, upper=100>, optimizing gets stuck on one of the bounds.
Description:
In pystan, I'm running the following:
N = 1000
x = np.arange(N)
y = 0.5*x/N + .5 + np.random.normal(0,.1,N)
test_data = {}
test_data['N'] = N
test_data['x'] = x
test_data['y'] = y
sm.optimizing(data=test_data)
On the following stan model:
data {
int<lower=0> N;
vector[N] y;
vector[N] x;
}
parameters {
real<lower=-100, upper=100> alpha;
// real alpha;
real beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
sigma ~ lognormal(0, 10);
y ~ normal( alpha + beta * x/N, sigma);
}
Reproducible Steps:
The code above returns either alpha=-100 or alpha=100. Replacing the bounded definition of alpha with the unbounded version gives the correct answer.
Current Version:
v2.23.0
I recreated the problem in RStan and verified that sampling gets the right answer with the interval-constrained parameter.
This is high priority to fix, though the only way to indicate that is with the next release as a milestone.
P.S. It'd be easier on the reader to just divide the predictor x by N before simulation and before feeding it in as data than to divide by N in two spots.
Thanks! (and sorry about the stray factor of N)
This issue is because the default setting for optimization is without the jacobian adjustment for constrained variables. Enabling it in recent versions of Stan recovers alpha $\approx$ 0.5