rstanarm icon indicating copy to clipboard operation
rstanarm copied to clipboard

Autoscale parameter tau in `decov()` prior?

Open fweber144 opened this issue 4 years ago • 6 comments

Summary:

It might be necessary/better to autoscale the parameter "tau" in the decov() prior.

Description:

If I understand the documentation here, here, and here correctly, parameter tau (the key parameter for the trace of the covariance matrix of group-level terms) is not autoscaled in the decov() prior, in contrast to the corresponding parameter in the lkj() prior used by default in stan_mvmer() and stan_jm(). However, I think it would make sense to perform this autoscaling. Consider a linear regression with group-level intercepts. The default exponential prior for tau with rate parameter 1 (not autoscaled) would imply that a value of 3 for the standard deviation of the group-level intercepts would already be extreme (taking qexp(0.95)). However, the standard deviation of y (and correspondingly also the true standard deviation of the group-level intercepts) may be much larger than 3 (in fact, it may be arbitrarily large; for example, y may have a standard deviation of 50).

The brms package also performs that autoscaling, see e.g.

data("kidiq", package = "rstanarm")
kidiq_gr <- within(kidiq, {
  mom_age_gr <- cut(mom_age,
                    breaks = unique(quantile(mom_age, probs = seq(0, 1, 0.1))),
                    include.lowest = TRUE)
})
brms::make_stancode(formula = kid_score ~ mom_hs + mom_iq + (1 | mom_age_gr),
                    data = kidiq_gr)

which gives in the model block:

[...]
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 90, 19.3);
  target += student_t_lpdf(sigma | 3, 0, 19.3)
    - 1 * student_t_lccdf(0 | 3, 0, 19.3);
  target += student_t_lpdf(sd_1 | 3, 0, 19.3)
    - 1 * student_t_lccdf(0 | 3, 0, 19.3);
[...]

where sd_1 is tau mentioned above and 19.3 is mad(kidiq$kid_score).

The documentation here states:

If all the variables were multiplied by a number, the trace of their covariance matrix would increase by that number squared. Thus, it is reasonable to use a scale-invariant prior distribution for the positive scale parameter, [...]

but I'm not sure if this is correct. Can we really arbitrarily multiply all the variables by a number? In case of only group-level intercepts for example, these variables are the "auxiliary" group-level intercepts (i.e., the group-level intercepts before multiplying them with their standard deviation) and are therefore assumed to follow a unit normal distribution. So they are restricted in their scale and it's the multiplication with tau which enables them to take on arbitrarily large values. I don't think there's an identifiability issue here.

I'm not too familiar with rstanarm's internals, but I think this autoscaling issue could be resolved by adding an autoscale argument to the decov() function. See also the reproducible steps below.

Reproducible Steps:

options(mc.cores = parallel::detectCores(logical = FALSE))
data("kidiq", package = "rstanarm")
kidiq_gr <- within(kidiq, {
  mom_age_gr <- cut(mom_age,
                    breaks = unique(quantile(mom_age, probs = seq(0, 1, 0.1))),
                    include.lowest = TRUE)
})
library(rstanarm)
myfit <- stan_glmer(kid_score ~ mom_hs + mom_iq + (1 | mom_age_gr),
                    data = kidiq_gr,
                    seed = 1466333131)
prior_summary(myfit)

gives as output:

[...]
Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
[...]

so the decov() prior indeed doesn't seem to be autoscaled which may also be seen from the fact that the following does not work:

myfit_autoscale <- update(
  myfit,
  prior_covariance = decov(regularization = 1, concentration = 1,
                           shape = 1, scale = 1,
                           autoscale = TRUE)
)

(as it throws the error

Error in decov(regularization = 1, concentration = 1, shape = 1, scale = 1,  : 
  unused argument (autoscale = TRUE)

).

RStanARM Version:

2.21.2 (from the Stan repo)

R Version:

4.1.0

Operating System:

Ubuntu 20.04.2 LTS

fweber144 avatar Jun 02 '21 19:06 fweber144

Thanks for opening the issue @fweber144. @bgoodri What do you think about this?

jgabry avatar Jun 15 '21 21:06 jgabry

The decov prior is scaled by the error standard deviation in Gaussian models (and by the overdispersion in negative binomial models) like lme4 does. If we were to scale by the standard deviation of the outcome, then we would need to stop scaling by the error standard deviation, which would be incompatible with lme4. So, I don't think we should.

bgoodri avatar Jun 15 '21 22:06 bgoodri

Ah, thanks for the explanation. Is that documented somewhere? I may have missed it.

fweber144 avatar Jun 16 '21 05:06 fweber144

Thanks for the answer @bgoodri. I am trying to replicate this prior in a PyMC3 model and also had the same doubt as @fweber144. It makes much more sense now!

PS: @fweber144 the model Reaction ~ Days + (Days|Subject) based on the famous sleepstudy dataset is a good example of why your reasoning about the exponential(1) for the std without scaling is right. I'm working with that model and having such a narrow prior for the std of the group-specific intercepts didn't make sense to me.

If we do

library(rstanarm)
data("sleepstudy", package="lme4")
model = stan_glmer(Reaction ~ Days + (Days|Subject), data=sleepstudy)
posterior_vs_prior(model, pars="varying") + 
  ggplot2::theme(legend.position = "none")

we can see the prior distributions actually go from -100 to 100 approx.

tomicapretto avatar Jun 17 '21 17:06 tomicapretto

Yeah, also in things like that sleepstudy example, you can and probably should rescale the outcome variable to not be in milliseconds. But that is something I think researchers should do on their own before calling functions in rstanarm.

On Thu, Jun 17, 2021 at 1:28 PM Tomás Capretto @.***> wrote:

Thanks for the answer @bgoodri https://github.com/bgoodri. I am trying to replicate this prior in a PyMC3 model and also had the same doubt as @fweber144 https://github.com/fweber144. It makes much more sense now!

PS: @fweber144 https://github.com/fweber144 the model Reaction ~ Days

  • (Days|Subject) based on the famous sleepstudy dataset is a good example of why your reasoning about the exponential(1) for the std without scaling is right. I'm working with that model and having such a narrow prior for the std of the group-specific intercepts didn't make sense to me.

If we do

library(rstanarm) data("sleepstudy", package="lme4")model = stan_glmer(Reaction ~ Days + (Days|Subject), data=sleepstudy) posterior_vs_prior(model, pars="varying") + ggplot2::theme(legend.position = "none")

we can see the prior distributions actually go from -100 to 100 approx.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/531#issuecomment-863426746, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKUVSC5OIFLYU4LEUUDTTIWFDANCNFSM457LQUSQ .

bgoodri avatar Jun 17 '21 17:06 bgoodri

I'm sorry if I have missed it, but I think my question from above hasn't been answered yet:

Ah, thanks for the explanation. Is that documented somewhere? I may have missed it.

I think documenting this would be helpful for users.

fweber144 avatar Aug 02 '21 12:08 fweber144