example-models icon indicating copy to clipboard operation
example-models copied to clipboard

Request for factorial ANOVA with generated means and sds

Open wdonald1985 opened this issue 9 years ago • 7 comments

Hello: I very much want to start using RSTAN for all of my analyses and have thus far figured out how to implement very basic models. My field commonly uses ANOVA, however, where there are interactions between categorical variables. I was hoping someone could share with me code for a factorial ANOVA (3 x 2, for example), with generated quantities for all means and also estimates for the standard deviations. I have attempted to translate Kruschke's ANOVA, which was implemented in RJAGS, to RSTAN but have run into difficulties.

wdonald1985 avatar Apr 08 '16 19:04 wdonald1985

Is there a pointer to the JAGS model you're trying to translate?

Or a definition of the prior and likelihood?

bob-carpenter avatar Apr 09 '16 21:04 bob-carpenter

Here is the model. When using RSTAN, I can get a linear model with factors to run properly. What I cant get, however, is estimates for all means through using the coefficients to generate quantities. Additionally, this model generates standard deviations for each group. I think the problem is in k-1 coding systems for categorical variables.

modelstring = " model { for ( i in 1:Ntotal ) { y[i] ~ dt( mu[i] , 1/(ySigma[x1[i],x2[i]])^2 , nu ) mu[i] <- a0 + a1[x1[i]] + a2[x2[i]] + a1a2[x1[i],x2[i]] } # nu ~ dgamma(5.83,0.0483) # mode 100, sd 50 nu ~ dexp(1/30.0) # for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { sigma[j1,j2] ~ dgamma( sigmaSh , sigmaRa ) # Prevent from dropping too close to zero: ySigma[j1,j2] <- max( sigma[j1,j2] , medianCellSD/1000 ) } } sigmaSh <- 1 + sigmaMode * sigmaRa sigmaRa <- ( sigmaMode + sqrt( sigmaMode^2 + 4_sigmaSD^2 ) ) /(2_sigmaSD^2) sigmaMode ~ dgamma(sGammaShRa[1],sGammaShRa[2]) sigmaSD ~ dgamma(sGammaShRa[1],sGammaShRa[2]) # a0 ~ dnorm( yMean , 1/(ySD*5)^2 ) # for ( j1 in 1:Nx1Lvl ) { a1[j1] ~ dnorm( 0.0 , 1/a1SD^2 ) } a1SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # for ( j2 in 1:Nx2Lvl ) { a2[j2] ~ dnorm( 0.0 , 1/a2SD^2 ) } a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { a1a2[j1,j2] ~ dnorm( 0.0 , 1/a1a2SD^2 ) } } a1a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # or try a folded t (Cauchy) # Convert a0,a1[],a2[],a1a2[,] to sum-to-zero b0,b1[],b2[],b1b2[,] : for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { m[j1,j2] <- a0 + a1[j1] + a2[j2] + a1a2[j1,j2] # cell means } } b0 <- mean( m[1:Nx1Lvl,1:Nx2Lvl] ) for ( j1 in 1:Nx1Lvl ) { b1[j1] <- mean( m[j1,1:Nx2Lvl] ) - b0 } for ( j2 in 1:Nx2Lvl ) { b2[j2] <- mean( m[1:Nx1Lvl,j2] ) - b0 } for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { b1b2[j1,j2] <- m[j1,j2] - ( b0 + b1[j1] + b2[j2] )
} } } (Kruschke, 2015)

wdonald1985 avatar Apr 09 '16 22:04 wdonald1985

Hi, you should include the Stan model, not the Bugs model! A

On Apr 9, 2016, at 6:19 PM, wdonald1985 [email protected] wrote:

Here is the model. When using RSTAN, I can get a linear model with factors to run properly. What I cant get, however, is estimates for all means through using the coefficients to generate quantities. Additionally, this model generates standard deviations for each group. I think the problem is in k-1 coding systems for categorical variables.

modelstring = " model { for ( i in 1:Ntotal ) { y[i] ~ dt( mu[i] , 1/(ySigma[x1[i],x2[i]])^2 , nu ) mu[i] <- a0 + a1[x1[i]] + a2[x2[i]] + a1a2[x1[i],x2[i]] }

nu ~ dgamma(5.83,0.0483) # mode 100, sd 50

nu ~ dexp(1/30.0)

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { sigma[j1,j2] ~ dgamma( sigmaSh , sigmaRa )

Prevent from dropping too close to zero:

ySigma[j1,j2] <- max( sigma[j1,j2] , medianCellSD/1000 ) } } sigmaSh <- 1 + sigmaMode * sigmaRa sigmaRa <- ( sigmaMode + sqrt( sigmaMode^2 + 4sigmaSD^2 ) ) /(2sigmaSD^2) sigmaMode ~ dgamma(sGammaShRa[1],sGammaShRa[2]) sigmaSD ~ dgamma(sGammaShRa[1],sGammaShRa[2])

a0 ~ dnorm( yMean , 1/(ySD*5)^2 )

for ( j1 in 1:Nx1Lvl ) { a1[j1] ~ dnorm( 0.0 , 1/a1SD^2 ) } a1SD ~ dgamma(aGammaShRa[1],aGammaShRa[2])

for ( j2 in 1:Nx2Lvl ) { a2[j2] ~ dnorm( 0.0 , 1/a2SD^2 ) } a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2])

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { a1a2[j1,j2] ~ dnorm( 0.0 , 1/a1a2SD^2 ) } } a1a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # or try a folded t (Cauchy)

Convert a0,a1[],a2[],a1a2[,] to sum-to-zero b0,b1[],b2[],b1b2[,] :

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { m[j1,j2] <- a0 + a1[j1] + a2[j2] + a1a2[j1,j2] # cell means } } b0 <- mean( m[1:Nx1Lvl,1:Nx2Lvl] ) for ( j1 in 1:Nx1Lvl ) { b1[j1] <- mean( m[j1,1:Nx2Lvl] ) - b0 } for ( j2 in 1:Nx2Lvl ) { b2[j2] <- mean( m[1:Nx1Lvl,j2] ) - b0 } for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { b1b2[j1,j2] <- m[j1,j2] - ( b0 + b1[j1] + b2[j2] )

} } } (Kruschke, 2015)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/stan-dev/example-models/issues/61#issuecomment-207865464

andrewgelman avatar Apr 09 '16 22:04 andrewgelman

Is that the right attached program? I don't see any categorical variables.

Consistent indentation on code makes it much easier to read. I may just be missing something.

  • Bob

On Apr 9, 2016, at 6:19 PM, wdonald1985 [email protected] wrote:

Here is the model. When using RSTAN, I can get a linear model with factors to run properly. What I cant get, however, is estimates for all means through using the coefficients to generate quantities. Additionally, this model generates standard deviations for each group. I think the problem is in k-1 coding systems for categorical variables.

modelstring = " model { for ( i in 1:Ntotal ) { y[i] ~ dt( mu[i] , 1/(ySigma[x1[i],x2[i]])^2 , nu ) mu[i] <- a0 + a1[x1[i]] + a2[x2[i]] + a1a2[x1[i],x2[i]] }

nu ~ dgamma(5.83,0.0483) # mode 100, sd 50

nu ~ dexp(1/30.0)

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { sigma[j1,j2] ~ dgamma( sigmaSh , sigmaRa )

Prevent from dropping too close to zero:

ySigma[j1,j2] <- max( sigma[j1,j2] , medianCellSD/1000 ) } } sigmaSh <- 1 + sigmaMode * sigmaRa sigmaRa <- ( sigmaMode + sqrt( sigmaMode^2 + 4sigmaSD^2 ) ) /(2sigmaSD^2) sigmaMode ~ dgamma(sGammaShRa[1],sGammaShRa[2]) sigmaSD ~ dgamma(sGammaShRa[1],sGammaShRa[2])

a0 ~ dnorm( yMean , 1/(ySD*5)^2 )

for ( j1 in 1:Nx1Lvl ) { a1[j1] ~ dnorm( 0.0 , 1/a1SD^2 ) } a1SD ~ dgamma(aGammaShRa[1],aGammaShRa[2])

for ( j2 in 1:Nx2Lvl ) { a2[j2] ~ dnorm( 0.0 , 1/a2SD^2 ) } a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2])

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { a1a2[j1,j2] ~ dnorm( 0.0 , 1/a1a2SD^2 ) } } a1a2SD ~ dgamma(aGammaShRa[1],aGammaShRa[2]) # or try a folded t (Cauchy)

Convert a0,a1[],a2[],a1a2[,] to sum-to-zero b0,b1[],b2[],b1b2[,] :

for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { m[j1,j2] <- a0 + a1[j1] + a2[j2] + a1a2[j1,j2] # cell means } } b0 <- mean( m[1:Nx1Lvl,1:Nx2Lvl] ) for ( j1 in 1:Nx1Lvl ) { b1[j1] <- mean( m[j1,1:Nx2Lvl] ) - b0 } for ( j2 in 1:Nx2Lvl ) { b2[j2] <- mean( m[1:Nx1Lvl,j2] ) - b0 } for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) { b1b2[j1,j2] <- m[j1,j2] - ( b0 + b1[j1] + b2[j2] )

} } } (Kruschke, 2015)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

bob-carpenter avatar Apr 09 '16 23:04 bob-carpenter

Through trial and many errors my RSTAN model is not looking very good. I'll post it after I clean it up a bit. Thanks!

wdonald1985 avatar Apr 10 '16 17:04 wdonald1985

Was there a JAGS model you're trying to translate? The one you sent didn't have any categorical variables I could find.

  • Bob

P.S. I'd strongly recommend version control to recover clean old versions of things. Where possible, I'd also recommend starting with the simplest thing that works and building out rather than trying to build a complicated model all at once.

On Apr 10, 2016, at 1:38 PM, wdonald1985 [email protected] wrote:

Through trial and many errors my RSTAN model is not looking very good. I'll post it after I clean it up a bit. Thanks!

— You are receiving this because you commented. Reply to this email directly or view it on GitHub

bob-carpenter avatar Apr 10 '16 22:04 bob-carpenter

Hello: Apologies on taking so long to provide the script to my model. Since you are both very well know statisticians, I wanted to ensure my code was up to par. This is a 3 * 2 interaction with categorical predictors. Since I am only getting one estimate for sigma, I think my model is assuming homogeneous variances. I want a model that does not make this assumption, thereby providing 6 estimates for sigma. Additionally, I would like a way to get the lower level estimates, for example, estimates for VS and GEAR without having to build another model.

myData <- mtcars X <- model.matrix(~ vs * gear, data = myData) K <- 6 // number of columns in matrix N <- length(mtcars$mpg) standat= list(X = X,K =K, N =N, y =y) stanmodelcode = ' data { int<lower=0> N;
vector[N] y;
int<lower=0> K;
matrix [N, K]X ;
} parameters { real <lower =0> sigma; vector [K] beta; } transformed parameters { } model { y ~ normal(X * beta, sigma); } generated quantities { real mu0_3; real mu1_3; real mu0_4; real mu1_4; real mu0_5; real mu1_5; mu0_3 <- beta[1]; mu1_3 <- beta[1] + beta[2]; mu0_4 <- beta[1] + beta[3]; mu1_4 <- beta[1] + beta[2] + beta[3] + beta[5]; mu0_5 <- beta[1] + beta [4]; mu1_5 <- beta[1] + beta[2] + beta[4] + beta[6]; } ' fit = stan(model_code=stanmodelcode, data=standat, iter=1200, warmup=10, seed = 1, chains=2, thin=2)

wdonald1985 avatar Apr 29 '16 22:04 wdonald1985