rstanarm
rstanarm copied to clipboard
Priors don't affect result
This is more likely due to my incompetence rather than an actual issue with the library but I've noticed that no matter how restrictive priors I give to stan_glm (my use case is logistic regression), the resulting model doesn't differ from a regular frequentist logistic regression. If I use either core rstan or blmr, I am able to influence the resulting model using informative priors.
My use case is basically that I have data based on random sampling which is outdated and newer data which has heavy selection bias and I aim to use the data based on random sampling to create a prior for intercept that should optimally combat the selection bias (I also use propensity scores but that's another story)
Below is a code with toy data with one predictor variable. Essentially same model is built four times, one using the regular glm-function and one Bayesian version each with rstan, rstanarm and blmr.
`library(rstanarm) library(rstan) library(brms)
#create prior data
random_y <- rbinom(3000, 1, prob=0.07)
#define prior mean
mu <- prop.table(table(random_y))[2]
#data for modeling
y <- rbinom(5000, 1, prob=0.35)
x <- rnorm(5000, mean=y, sd=2)
df <- data.frame(x, y)
#new observations for prediction
newObs <- data.frame(rnorm(100, mean=0.5, sd=3))
names(newObs)[1] <- "x"
#frequentist logistic regression
fit1 <- glm(y ~ x, data = df, family="binomial")
#bayesian logistic regression using rstanarm
fit2 <- stan_glm(y ~ x, data=df, family="binomial", prior_intercept = student_t(location=log(mu/(1-mu)), scale=0.05), prior=student_t(location=0, scale=0.5))
#with Stan
stanMod <- "data {
int N; // number of trials int N_test; // number of trials test data int <lower = 0, upper = 1> y [N]; // number of hits vector [N] x; // predictive variable vector[N_test] x_test;
}
parameters {
// The (unobserved) model parameters that we want to recover real alpha; real beta;
}
model {
// A logistic regression model relating the x to y y ~ bernoulli_logit(alpha + beta * x);
// Prior models for the unobserved parameters alpha ~ normal(log(0.07/(1-0.07)), 0.005); beta ~ normal(0, 0.5);
}
generated quantities { vector[N_test] y_test; for(i in 1:N_test) { y_test[i] = alpha + beta*x_test[i]; } }"
StanDat <- list(N=nrow(df),x=df$x,y=df$y, N_test=nrow(newObs),x_test=newObs$x)
fit3 <- stan(model_code = stanMod, data=StanDat, chains=4)
#with brms
brmPriors <- c(set_prior("normal(0,0.5)", class = "b", coef = "x"), set_prior("normal(log(0.07/(1-0.07)),0.05", class = "Intercept"))
fit4 <- brm(y~x, data=df, family = "bernoulli", prior=brmPriors)
#results
coef(fit1) coef(fit2) mean(extractStan$alpha) mean(extractStan$beta) fixef(fit4)`
As you can see, fit1 and fit2 (glm, rstanarm) correspond to each other, while fit3 and fit4 (rstan, blmr) are also a pair. In my mind, the rstanarm result should be similar to the rstan and blmr fits but something goes awry.
As said, there likely are errors in how I set up the model. Coming from social sciences I am not a particularly accomplished statistician even in frequentist tradition and I am pretty inexperienced in Bayesian methods.
I prefer rstanarm as an interface to Stan over others so would like to use it :)
I use R 4.0.4 on Windows 10.