emmeans
emmeans copied to clipboard
A question about the linear functions values in a balanced 2-way ANOVA with interaction - joint_tests()
Dear Professor,
This is not a bug report, I'm just trying to understand what's going on under the hood. Would not you mind if I asked you to guide me a little?
The data: It's a toy dataset with 2x2 Sex and Treatment design. It's balanced (no class is missing) and with equal number of observations in each cell.
set.seed(1000)
rbind(
data.frame(Sex="Male", Treatment="A", Response = rnorm(10, mean=1)),
data.frame(Sex="Male", Treatment="B", Response = rnorm(10, mean=2)),
data.frame(Sex="Female", Treatment="A", Response = rnorm(10, mean=2)),
data.frame(Sex="Female", Treatment="B", Response = rnorm(10, mean=2))
) %>%
mutate(Sex = factor(Sex),
Treatment = factor(Treatment)) -> balanced
Let's fit a simple linear model to reproduce ANOVA
options(contrasts=c("contr.treatment", "contr.poly"))
m <- lm(Response ~ Sex * Treatment , data=balanced)
Let's run ANOVA. Type-1 = Type-3 in this scenario, so either approach will work. This is just a toy example, let's focus on the numbers itself, not meaning.
> summary(aov(m))
Df Sum Sq Mean Sq F value Pr(>F)
Sex 1 7.299 7.299 8.455 0.0062 **
Treatment 1 0.118 0.118 0.137 0.7132
Sex:Treatment 1 6.157 6.157 7.132 0.0113 *
Residuals 36 31.077 0.863
Now let's reproduce it with joint_tests (for balanced case LS-means = raw means, so it should same results)
> (jt <- joint_tests(m))
model term df1 df2 F.ratio p.value
Sex 1 36 8.455 0.0062
Treatment 1 36 0.137 0.7132
Sex:Treatment 1 36 7.132 0.0113
Perfect agreement.
Now let's check the linear functions values:
> attributes(jt)$est.fcns
$Sex
(Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,] 0 -0.8944272 0 -0.4472136
$Treatment
(Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,] 0 0 -0.8944272 -0.4472136
$`Sex:Treatment`
(Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,] 0 0 0 -1
The coefficients for Sex and Treatment look close to -1 and -0.5**.
Let's reproduce the main effects step by step naively, just by creating appropriate contrasts over the LS-means.
> (em <- emmeans(lm(Response ~ Sex * Treatment , data=balanced), ~ Sex*Treatment))
Sex Treatment emmean SE df lower.CL upper.CL
Female A 2.310 0.294 36 1.714 2.91
Male A 0.671 0.294 36 0.075 1.27
Female B 1.634 0.294 36 1.038 2.23
Male B 1.564 0.294 36 0.969 2.16
and the orthogonal contrasts:
# FemA MalA FemB MalB
contrast(em, list("Sex" = c(-0.5, 0.5, -0.5, 0.5),
"Treatment" = c(-0.5, -0.5, 0.5, 0.5),
"Sex:Treatment" = c(-0.5, 0.5, 0.5, -0.5))) %>%
data.frame %>%
mutate(F = t.ratio^2, .after="t.ratio") %>%
mutate(across(where(is.numeric), ~round(., 4)))
contrast estimate SE df t.ratio F p.value
1 Sex -0.8543 0.2938 36 -2.9078 8.4552 0.0062
2 Treatment 0.1088 0.2938 36 0.3704 0.1372 0.7132
3 Sex:Treatment -0.7847 0.2938 36 -2.6706 7.1324 0.0113
OK, both F statistic and p-values are reproduced
Now let's switch to raw model coefficients.
I'm going to reproduce the results with multcomp::glht()
# First I'm defining the means based on the coefficients:
K <- matrix(c(1, 0, 0, 0, # Female A
1, 0, 1, 0, # Female B
1, 1, 0, 0, # Male A
1, 1, 1, 1), # Male B
byrow = TRUE, ncol = 4,
dimnames = list(c("Female A",
"Female B",
"Male A",
"Male B"),
c("Intercept", "SexMale",
"TreatmentB", "SexMale:TreatmentB")))
> K
Intercept SexMale TreatmentB SexMale:TreatmentB
Female A 1 0 0 0
Female B 1 0 1 0
Male A 1 1 0 0
Male B 1 1 1 1
Let's reproduce for Sex: = c(-0.5, 0.5, -0.5, 0.5)
library(multcomp)
> (contr_sex <- matrix(c(-0.5*K[1,] + 0.5*K[3,] - 0.5*K[2,] + 0.5*K[4,]), byrow = T, nrow = 1))
[,1] [,2] [,3] [,4]
[1,] 0 1 0 0.5
> summary(glht(m, matrix(c( 0, 1, 0, 0.5), byrow = T, nrow = 1)))
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = Response ~ Sex * Treatment, data = balanced)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
1 == 0 -0.8543 0.2938 -2.908 0.0062 **
---
By the way, this could be obtained from the emmeans directly - it warns about the interaction involvement:
> (x <- contrast(emmeans(m, ~ Sex), "revpairwise"))
NOTE: Results may be misleading due to involvement in interactions
contrast estimate SE df t.ratio p.value
Male - Female -0.854 0.294 36 -2.908 0.0062
Results are averaged over the levels of: Treatment
> x@linfct
(Intercept) SexMale TreatmentB SexMale:TreatmentB
[1,] 0 1 0 0.5 # rather than 0.8944272 and 0.4472136
Anyway, for Sex it agrees with LS-means and joint_tests() -2.907783^2 = 8.455201 Same for Treatment and their interaction.
OK, now let's compare the coefficients obtained from joint_tests() with the contrasts using 0.5s coefficients: I will only focus on Sex to limit the length:
> summary(glht(m, matrix(c(0, 0.8944272, 0, 0.4472136), byrow = T, nrow = 1)))
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = Response ~ Sex * Treatment, data = balanced)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
1 == 0 -0.7641 0.2628 -2.908 0.0062 **
Well, the estimates differ, -0.8543 vs. -0.7641, the standard errors differ too: 0.2938 vs. 0.2628. But t.values agree: -2.907783 in both.
So this agreed closely with was was created by aov() because the coefficients were close enough to 0.5 and 1.