rstanarm icon indicating copy to clipboard operation
rstanarm copied to clipboard

The `concentration` parameter in `decov` has no effect.

Open stla opened this issue 5 years ago • 17 comments
trafficstars

Hello,

Here is a model with two between variances:

library(rstanarm)
rstanarm <- stan_lmer(
  y ~  0 + Part + (1|Operator) + (1|Operator:Part), data = dat,
  prior = normal(0, 100),
  prior_aux = cauchy(0, 5),
  prior_covariance = decov(concentration = 1000000, shape = 1, scale = 1/1),
  iter = 10000
)
posterior_interval(rstanarm, prob = 95/100)

From my understanding, if the concentration parameter is high (here 1e6), the two between variances should be equal. But this is not the case:

                                                      2.5%      97.5%
PartA1                                        7.6725149693 10.9984804
PartA2                                       17.7830860979 21.1723343
...
sigma                                         1.5266337590  2.3625997
Sigma[Operator:Part:(Intercept),(Intercept)]  0.0004819209  3.6359430
Sigma[Operator:(Intercept),(Intercept)]       0.0251682030 10.6149320

And I don't see any significant difference when I set it to 1.

To reproduce the data:

SimAV2mixed <- function(I, J, Kij, mu=0, alphai, sigmaO=1,
                        sigmaPO=1, sigmaE=1, factor.names=c("Part","Operator"),
                        resp.name="y", keep.intermediate=FALSE){
  Operator <- rep(1:J, each=I)
  Oj <- rep(rnorm(J, 0, sigmaO), each=I)
  Part <- rep(1:I, times=J)
  Pi <- rep(alphai, times=J)
  POij <- rnorm(I*J, 0, sigmaPO)
  simdata0 <- data.frame(Part, Operator, Pi, Oj, POij)
  simdata0$Operator <- factor(simdata0$Operator)
  levels(simdata0$Operator) <- 
    sprintf(paste0("%0", floor(log10(J))+1, "d"), 1:J)
  simdata0$Part <- factor(simdata0$Part)
  levels(simdata0$Part) <- sprintf(paste0("%0", floor(log10(I))+1, "d"), 1:I)
  simdata <- 
    as.data.frame(
      sapply(simdata0, function(v) rep(v, times=Kij), simplify=FALSE))
  Eijk <- rnorm(sum(Kij), 0, sigmaE)
  simdata <- cbind(simdata, Eijk)
  simdata[[resp.name]] <- mu + with(simdata, Oj+Pi+POij+Eijk)
  levels(simdata[,1]) <- paste0("A", levels(simdata[,1]))
  levels(simdata[,2]) <- paste0("B", levels(simdata[,2]))
  names(simdata)[1:2] <- factor.names
  if(!keep.intermediate) simdata <- simdata[,c(factor.names,resp.name)]
  simdata
}

set.seed(666)  
I = 2; J = 6; Kij = rpois(I*J, 1) + 3
alphai <- c(10, 20)
sigmaO <- 1
sigmaPO <- 0.5
sigmaE <- 2
dat <- SimAV2mixed(I, J, Kij, mu = 0, alphai = alphai, 
                   sigmaO = sigmaO, sigmaPO = sigmaPO, sigmaE = sigmaE)

stla avatar Nov 07 '20 22:11 stla

Hi @stla, I think the issue here is that the concentration parameter [edit: see below, I confused concentration and regularization] is relevant for the prior on the correlation matrix between the varying intercepts and slopes but your model only has varying intercepts and doesn't have any varying slopes. Here's an example with both varying intercepts and slopes where you can see the effect of the concentration parameter:

# formula with both varying intercepts and slopes
fit <- stan_lmer(mpg ~ (1 + wt|cyl), data = mtcars, prior_covariance = decov(concentration = 10000))

Here is the section of the print output with the hierarchical standard deviations:

Error terms:
 Groups   Name        Std.Dev. Corr 
 cyl      (Intercept) 4.7           
          wt          4.7      -0.16
 Residual             2.7           
Num. levels: cyl 3 

jgabry avatar Nov 10 '20 16:11 jgabry

Hi @jgabry

Thank you. So perhaps I misunderstood the doc or it is not clear. According to my understanding, concentration is a parameter of the Dirichlet prior distribution on the proportions of the groups-level variances. Then this is what I thought:

data {
  int<lower=1> N;
  real y[N];
  int<lower=1> I;
  int<lower=1> J;
  int<lower=1> PartID[N];
  int<lower=1> OperatorID[N];
  int<lower=1> InteractionID[N];
  real<lower=0> shape;
  real<lower=0> scale;
  real<lower=0> concentration;
}

parameters {
  real<lower=0> tau;
  simplex[2] theta;
  vector[I] PartA;
  vector[J] Op;
  vector[I*J] OpPart;
  real<lower=0> sigmaE;
}

transformed parameters {
  real<lower=0> sigma_between_total;
  vector<lower=0>[2] sigmas_between;
  real<lower=0> sigmaO;
  real<lower=0> sigmaOP;
  sigma_between_total = sqrt(2) * tau; 
  sigmas_between = sigma_between_total * sqrt(theta);
  sigmaO = sigmas_between[1];
  sigmaOP = sigmas_between[2];
}

model {
  tau ~ gamma(shape, scale);
  theta ~ dirichlet(rep_vector(concentration, 2));
  Op ~ normal(0, sigmaO);
  OpPart ~ normal(0, sigmaOP);
  sigmaE ~ cauchy(0, 5);
  PartA ~ normal(0, 100); 
  for(k in 1:N){
    y[k] ~ normal(
      PartA[PartID[k]] + 
        Op[OperatorID[k]] + 
          OpPart[InteractionID[k]], 
      sigmaE
    );
  }
}

generated quantities {
  real<lower=0> sigmaTotal;
  sigmaTotal = sqrt(sigmaE^2 + sigmaO^2 + sigmaOP^2);
}

Is it wrong? If this is wrong, how could we control the prior distribution on the groups-level variances?

stla avatar Nov 10 '20 18:11 stla

Oops, I was thinking about the regularization parameter (which is for the LKJ prior on the correlation matrix) and not the concentration parameter, sorry! You're right that concentration is for the dirichlet. Let me take a closer look at this and get back to you.

jgabry avatar Nov 10 '20 18:11 jgabry

how could we control the prior distribution on the groups-level variances

The shape and scale arguments to decov() will also definitely affect the prior so you can use those. But I'm now also wondering, like you, why the concentration parameter only seems to have an affect when there are both varying slopes and intercepts. For the regularization parameter that makes sense, but I thought that the concentration parameter should affect the variances/sds even if just varying intercepts. @bgoodri Is that not the case?

jgabry avatar Nov 10 '20 19:11 jgabry

For terms like (1 | g), then the variance-covariance matrix is 1x1 and the Dirichlet prior on the variance proportion would put all of its mass on 1. At that point, the gamma prior with the given shape and scale pertains to the standard deviation of the intercept across levels of g.

On Tue, Nov 10, 2020 at 2:08 PM Jonah Gabry [email protected] wrote:

how could we control the prior distribution on the groups-level variances

The shape and scale arguments to decov() will also definitely affect the prior so you can use those. But I'm now also wondering, like you, why the concentration parameter only seems to have an affect when there are both varying slopes and intercepts. For the regularization parameter that makes sense, but I thought that the concentration parameter should affect the variances/sds even if just varying intercepts. @bgoodri https://github.com/bgoodri Is that not the case?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-724906278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKXPZY2HEG56XKNG4ALSPGFUFANCNFSM4TN5R7VA .

bgoodri avatar Nov 10 '20 19:11 bgoodri

For terms like (1 | g), then the variance-covariance matrix is 1x1

Ok right, thanks. So I was on the right track in my initial comment about the model only containing varying intercepts and not varying slopes, but then I confused myself!

jgabry avatar Nov 10 '20 20:11 jgabry

Sorry I don't understand. Here there are two (1|g) terms. The doc says that the Gamma prior is assigned on the sum of the two corresponding variances (up to a square root somewhere), and that the two proportions of these variances a priori follow the Dirichlet distribution. What am I missing? And then, how to set the prior distribution on these two variances, if I'm wrong?

stla avatar Nov 10 '20 21:11 stla

@bgoodri given that I seem to be confusing myself on this topic can you clarify here? One thing is that I think the doc is probably using "covariance matrix" to refer both to the full matrix and to the sub-matrices for the different terms, which might be part of the confusion.

jgabry avatar Nov 10 '20 21:11 jgabry

If you have multiple lme4-style statements, like (1 | a) and (1 | b), then the two variance-covariance matrices are each 1x1 and independent of each other before seeing the data. If instead you had something like (1 + x | c), then the single variance-covariance matrix is 2x2 and then the prior on the correlation matrix and the two variance proportions comes into play, although they are jointly uniform by default.

On Tue, Nov 10, 2020 at 4:41 PM Jonah Gabry [email protected] wrote:

@bgoodri https://github.com/bgoodri given that I seem to be confusing myself on this topic can you clarify here? One thing is that I think the doc is probably using "covariance matrix" to refer both to the full matrix and to the sub-matrices for the different terms, which might be part of the confusion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-724983463, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQZH2KUIKZU4IE5VALSPGXP5ANCNFSM4TN5R7VA .

bgoodri avatar Nov 10 '20 21:11 bgoodri

So, if there are two (1|g) terms, rstanarm assigns the same Gamma prior distribution on the corresponding standard deviations?

stla avatar Nov 12 '20 12:11 stla

You might be able to pass 2 shapes and / or scales, but the gamma distribution is used for both.

On Thu, Nov 12, 2020, 7:21 AM stla [email protected] wrote:

So, if there are two (1|g) terms, rstanarm assigns the same Gamma prior distribution on the corresponding standard deviations?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726044561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKX4AHFKSP6DKQSMQZTSPPHL5ANCNFSM4TN5R7VA .

bgoodri avatar Nov 12 '20 13:11 bgoodri

I tried something like prior_covariance = decov(shape = c(50,1), scale = c(1,1)) but I didn't obtain expected results. How should one specify multiple Gamma priors?

stla avatar Nov 12 '20 14:11 stla

That was supposed to work.

On Thu, Nov 12, 2020 at 9:59 AM stla [email protected] wrote:

I tried something like prior_covariance = decov(shape = c(50,1), scale = c(1,1)) but I didn't obtain expected results. How should one specify multiple Gamma priors?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726130695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQZMJNTTOFCUVW2EE3SPPZ6FANCNFSM4TN5R7VA .

bgoodri avatar Nov 12 '20 15:11 bgoodri

Yes, sorry, this works actually.

Both the R documentation and the online documentation do not mention this possibility.

stla avatar Nov 12 '20 16:11 stla

And also, there's no way (?) to know in advance the order of the standard deviations (I mean which one comes first, etc).

stla avatar Nov 12 '20 17:11 stla

I would guess that it is in the same order that they appear in the formula, but I wouldn't be too surprised if lme4 changes the order.

On Thu, Nov 12, 2020 at 12:26 PM stla [email protected] wrote:

And also, there's no way (?) to know in advance the order of the standard deviations (I mean which one comes first, etc).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/483#issuecomment-726221845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKQF5NY5IBACSY66SUTSPQLDJANCNFSM4TN5R7VA .

bgoodri avatar Nov 12 '20 17:11 bgoodri

it is in the same order that they appear in the formula

No :)

stla avatar Nov 12 '20 18:11 stla