rstanarm icon indicating copy to clipboard operation
rstanarm copied to clipboard

The intercept is affected by the data when prior_PD = TRUE

Open sambrilleman opened this issue 4 years ago • 5 comments

Summary:

The intercept is back-transformed by dot_product(xbar, beta) even when prior_PD = TRUE.

Description:

When prior_PD = TRUE the covariates have no impact on the model fitting. Yet they have an impact on the intercept that is returned to the user, through their mean that is used in centering the intercept. Demonstrated by the reproducible example below.

Is this the desired behaviour? (my guess is probably not?)

Someone noticed this with stan_surv, but I think it might apply to every modelling function that back-transforms the intercept in some way.

Reproducible Steps:

library(rstanarm)

mtcars$wt1 = mtcars$wt * 10
fit1 <- stan_glm(
    mpg ~ wt1,            
    data = mtcars, 
    prior_intercept = normal(0, 10, autoscale = FALSE),
    prior = normal(5, 10, autoscale = FALSE),
    prior_aux = normal(0, 10, autoscale = FALSE),
    prior_PD = TRUE,
    refresh = 0,
)

mtcars$wt2 = mtcars$wt * -10
fit2 <- update(fit1, formula. = mpg ~ wt2)

fit1$coefficients['(Intercept)']
fit2$coefficients['(Intercept)']

RStanARM Version:

2.12.2 feature/survival branch. I haven't tried it on the master branch but the same source code is there.

R Version:

4.0.2

Operating System:

Windows 10.

sambrilleman avatar Oct 19 '20 04:10 sambrilleman

Fwiw, I think the solution would be to replace this with something like:

if (has_intercept == 1) {
    if (prior_PD == 0 && dense_X) alpha[1] = gamma[1] - dot_product(xbar, beta);
    else alpha[1] = gamma[1];
}

or similarly for stan_surv replace this with something like:

if (has_intercept == 1) {
    if (prior_PD) alpha = gamma[1];
    else alpha = log_crude_event_rate - dot_product(x_bar, beta) + gamma[1];
} else {
    if (prior_PD) alpha = 0;
    else alpha = log_crude_event_rate - dot_product(x_bar, beta);
}

sambrilleman avatar Oct 19 '20 04:10 sambrilleman

Someone noticed this with stan_surv, but I think it might apply to every modelling function that back-transforms the intercept in some way.

Yeah it probably does. I can't remember if @bgoodri and I talked about this at some point (it may have been something similar but different). @bgoodri Any preference on how this is handled?

jgabry avatar Oct 20 '20 21:10 jgabry

The realization of the coefficients from the prior predictive distribution should affect the intercept in the generated quantities block.

bgoodri avatar Oct 20 '20 21:10 bgoodri

The realization of the coefficients from the prior predictive distribution should affect the intercept in the generated quantities block.

Ok, yeah I kind of get what you mean. It just feels strange because it is correcting for a centering that "kind of never happens" (although I'm not clear on exactly what I mean by "doesn't happen", ha).

But that effect on the intercept of course flows through to the prior predictive distribution for the data. So taking the median of simulated data from the prior predictive distribution we get:

library(rstanarm)

mtcars$wt1 = mtcars$wt * 100
fit1 <- stan_glm(
  mpg ~ wt1,            
  data = mtcars, 
  prior_intercept = normal(0, 10, autoscale = FALSE),
  prior = normal(5, 10, autoscale = FALSE),
  prior_aux = normal(0, 10, autoscale = FALSE),
  prior_PD = TRUE,
  refresh = 0,
)

mtcars$wt2 = mtcars$wt * -100
fit2 <- update(fit1, formula. = mpg ~ wt2)

nd = data.frame(wt1 = 0, wt2 = 0)

median(posterior_predict(fit1, nd))
median(posterior_predict(fit2, nd))

which means a different prior predictive distribution even though the user-specified priors are identical. But I guess the covariate data changed which in turn means the implicit priors have changed (although the user might not be aware of that), so maybe it makes sense, I dunno.

It is strangest for stan_surv, because we center the intercept by the log crude event rate as well, so it isn't just the covariates that influence the prior predictive distribution but also the event times and censoring indicator for the outcome data that affect the prior predictive distribution -- so even when you don't condition on the data, you are still having the data indirectly influence the prior by centering the intercept by a data-dependent quantity. Something about that feels a bit strange. But that is just rehashing what was discussed in the previous issue thread.

Essentially I think the issue here is that the definition of the (implicit) prior depends on the data, so there is no true prior distribution to draw from that doesn't condition on the data. Unless we draw from the prior that doesn't incorporate those data-dependent transformations/shifts.

(cc'ing @ermeel here as well, just so you're across the details).

sambrilleman avatar Oct 21 '20 01:10 sambrilleman

The recentering always happens because the draw from the prior distribution of the coefficients is never exactly zero, although it is never far from zero under the default prior. I would have preferred no conditioning on the outcome variable at all, but Andrew forced us to for the book. I also agree that stan_surv is weirder because the prior predictive distribution it draws from is even less prior than for the other stan_ functions, but I think the way it deals with the coefficients is fine.

bgoodri avatar Oct 21 '20 03:10 bgoodri

Closing this, and not merging https://github.com/stan-dev/rstanarm/pull/480, based on @bgoodri's comment above ☝️ .

sambrilleman avatar Oct 31 '23 23:10 sambrilleman