rstanarm
rstanarm copied to clipboard
Survival models with prior_PD = TRUE still depend on the data
Summary:
A stan_surv
model with prior_PD = TRUE
still has an implicit prior for prior_intercept
that is shifted by a data dependent quantity (the log crude event rate). Even though the model is not conditioning on the data.
Description:
stan_surv
has a prior_intercept
that is shifted by the log crude event rate to help with numerical stability. See the description in the help file here: https://github.com/stan-dev/rstanarm/blob/ff0a22b90cd70828616f73c39fa54ebb68522040/R/stan_surv.R#L148.
The estimates returned to the user however, are back transformed so that they don't need to worry about that shift.
The thing that is slightly strange is that we still do this shift on the prior even when not conditioning on the data. So changing the data when prior_PD = TRUE
and the seed
is unchanged will still produce slightly different MCMC samples. See the reproducible steps below.
Reproducible Steps:
library(tidyverse)
library(rstanarm)
# ex data 1
empty_dat1 <- data.frame(
time = 1,
delta = c(1, 0, 1, 1, 0),
trt = factor(letters[1:5])
)
# ex data 2
empty_dat2 <- empty_dat1
empty_dat2$time <- 10
prior_checks1 <- stan_surv(Surv(time, delta) ~ trt, data = empty_dat1,
basehaz = "weibull-aft", prior_PD = TRUE,
prior = normal(0, .5), prior_aux = exponential(1),
prior_intercept = normal(1, 1, autoscale = FALSE),
chains = 4, iter = 2000, seed = 8354)
prior_checks2 <- stan_surv(Surv(time, delta) ~ trt, data = empty_dat2,
basehaz = "weibull-aft", prior_PD = TRUE,
prior = normal(0, .5), prior_aux = exponential(1),
prior_intercept = normal(1, 1, autoscale = FALSE),
chains = 4, iter = 2000, seed = 8354)
prior_checks1
prior_checks2
RStanARM Version:
2.21.2 (feature/survival
branch)
To fix this, I think we can just skip the shift/centering of the intercept prior when prior_PD = TRUE
? Would you agree @ermeel86?
Good catch Sam! I agree with that proposal. While thinking about it I realised that there is another data dependence under prior_PD=T
when it comes to the default way of knot placement (e.g. for the m-splines):
The internal knots are placed at equally spaced percentiles of the distribution of uncensored event times.
I admit that this is somewhat more implicit but I think it still has an influence on the priors / model.
I find it hard to come up with an automated alternative. Another option would be to force the user to specify knots manually under prior_PD=T
... But not sure if this in the end is intended and the user will compare apple with oranges.
I wonder whether default priors that are data-dependent are changed in other rstanarm functions like stan_lm
as well, when using prior_PD=T
?
Good catch Sam!
Actually, wasn't my sharp eye. Rather, somebody emailed me asking why the outputs weren't consistent! ha 😨
I admit that this is somewhat more implicit but I think it still has an influence on the priors / model.
Interesting, yeah, great point. This one is probably much harder to get around I think. I guess this will only matter for the M-spline/B-spline models, but nonetheless, it could be confusing for the user.
But not sure if this in the end is intended and the user will compare apple with oranges.
I agree, I guess the issue is that any action we take to make the prior not data-dependent is then likely to also mean that it isn't actually the same prior as is used when you do condition on the data (i.e. prior_PD = FALSE
), argh.
I wonder whether default priors that are data-dependent are changed in other rstanarm functions like
stan_lm
as well, when usingprior_PD=T
?
Yeah, not sure what stan_lm
would do. Or if there are other rstanarm modelling functions that have a data-dependent prior (aside from the usual autoscale = TRUE
). Perhaps @jgabry or @bgoodri can advise.
To fix this, I think we can just skip the shift/centering of the intercept prior when
prior_PD = TRUE
? Would you agree @ermeel86?
Then how are you drawing from the prior predictive distribution of the same model whose posterior distribution is being estimated when prior_PD = FALSE
?
Then how are you drawing from the prior predictive distribution of the same model whose posterior distribution is being estimated when
prior_PD = FALSE
?
Good point, I guess you wouldn't be.
I guess that probably resolves it. If the prior is data-dependent when the model is fit conditioning on the outcome data, then the prior must be data-dependent even when not conditioning on the outcome data. And the lack of intuition for the user is probably just something we have to accept...