report icon indicating copy to clipboard operation
report copied to clipboard

report() assigning effect size to intercept in model

Open RenyBB opened this issue 1 year ago • 10 comments

For example, we want to do a type III ANOVA, so we fit a linear model with categorical predictors and use the car::Anova function:

some_linear_model <- lm(mpg ~ as.factor(cyl)*as.factor(am), data=mtcars) 
some_anova <- car::Anova(some_linear_model, type = "III")

Then, we use report() and report_table() to output the results:

report::report(some_anova)
report::report_table(some_anova)

The effect sizes using repor_tablet() are correct, but the effect sizes using report() don't match up with the correct effects:

  • The main effect of (Intercept) is statistically significant and large (F(1, 26) = 171.10, p < .001; Eta2 (partial) = 0.41, 95% CI [0.15, 1.00])
  • The main effect of as.factor(cyl) is statistically significant and large (F(2, 26) = 9.12, p < .001; Eta2 (partial) = 0.20, 95% CI [0.02, 1.00])
  • The main effect of as.factor(am) is statistically significant and medium (F(1, 26) = 6.35, p = 0.018; Eta2 (partial) = 0.10, 95% CI [0.00, 1.00])
  • The interaction between as.factor(cyl) and as.factor(am) is statistically not significant and large (F(2, 26) = 1.38, p = 0.269; Eta2 (partial) = 0.41, 95% CI [0.15, 1.00])

Compare these to the results obtained with report_table(): image

RenyBB avatar Jul 05 '24 11:07 RenyBB

It seems silly, but for type 3 ANOVA tables we do get the intercept term, and it does have a meaning: It is the proportional reduction in error accounted for by the inclusion of the intercept. So in a sense, this is the "variance explained" by the intercept:


library(performance)

m0 <- lm(mpg ~ 0, data = mtcars)
m1 <- lm(mpg ~ 1, data = mtcars)

car::Anova(m1, type = 3)
#> Anova Table (Type III tests)
#> 
#> Response: mpg
#>             Sum Sq Df F value    Pr(>F)    
#> (Intercept)  12916  1  355.58 < 2.2e-16 ***
#> Residuals     1126 31                      
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

effectsize::F_to_eta2(f = 355.58, df = 1, df_error = 31, ci = NULL) |> 
  print(digits = 6)
#> Eta2 (partial)
#> --------------
#> 0.919810 

1 - (rmse(m1) ^ 2) / (rmse(m0) ^ 2)
#> [1] 0.9198104

mattansb avatar Jul 05 '24 16:07 mattansb

Sorry, I can see how the title I chose is completely uninformative. I've edited the post to describe the issue in more detail - the values for the effect sizes don't match the correct effects when using report(). For example, the effect size for the interaction term should be 0.1 but report() returns 0.41.

RenyBB avatar Jul 08 '24 11:07 RenyBB

Ah, I see.

@IndrajeetPatil @rempsyc wasn't this recycling issue resolved in #198 ?

mattansb avatar Jul 08 '24 16:07 mattansb

The effect sizes are misaligned (probably because it is NA for the intercept instead of an empty string). Reprex:

packageVersion("report")
#> [1] '0.5.8.5'

some_linear_model <- lm(mpg ~ as.factor(cyl)*as.factor(am), data=mtcars) 
some_anova <- car::Anova(some_linear_model, type = "III")

report::report(some_anova)
#> Type 3 ANOVAs only give sensible and informative results when covariates
#>   are mean-centered and factors are coded with orthogonal contrasts (such
#>   as those produced by `contr.sum`, `contr.poly`, or `contr.helmert`, but
#>   *not* by the default `contr.treatment`).
#> The ANOVA suggests that:
#> 
#>   - The main effect of (Intercept) is statistically significant and large (F(1,
#> 26) = 171.10, p < .001; Eta2 (partial) = 0.41, 95% CI [0.15, 1.00])
#>   - The main effect of as.factor(cyl) is statistically significant and large
#> (F(2, 26) = 9.12, p < .001; Eta2 (partial) = 0.20, 95% CI [0.02, 1.00])
#>   - The main effect of as.factor(am) is statistically significant and medium
#> (F(1, 26) = 6.35, p = 0.018; Eta2 (partial) = 0.10, 95% CI [0.00, 1.00])
#>   - The interaction between as.factor(cyl) and as.factor(am) is statistically not
#> significant and large (F(2, 26) = 1.38, p = 0.269; Eta2 (partial) = 0.41, 95%
#> CI [0.15, 1.00])
#> 
#> Effect sizes were labelled following Field's (2013) recommendations.
report::report_table(some_anova)
#> Type 3 ANOVAs only give sensible and informative results when covariates
#>   are mean-centered and factors are coded with orthogonal contrasts (such
#>   as those produced by `contr.sum`, `contr.poly`, or `contr.helmert`, but
#>   *not* by the default `contr.treatment`).
#> Parameter                    | Sum_Squares | df | Mean_Square |      F |      p | Eta2 (partial) | Eta2_partial 95% CI
#> ----------------------------------------------------------------------------------------------------------------------
#> (Intercept)                  |     1573.23 |  1 |     1573.23 | 171.10 | < .001 |                |                    
#> as.factor(cyl)               |      167.71 |  2 |       83.85 |   9.12 | < .001 |           0.41 |        [0.15, 1.00]
#> as.factor(am)                |       58.43 |  1 |       58.43 |   6.35 | 0.018  |           0.20 |        [0.02, 1.00]
#> as.factor(cyl):as.factor(am) |       25.44 |  2 |       12.72 |   1.38 | 0.269  |           0.10 |        [0.00, 1.00]
#> Residuals                    |      239.06 | 26 |        9.19 |        |        |                |

Created on 2024-07-10 with reprex v2.1.1

So yes, just like in #198, it seems like indeed it wasn't properly fixed since we have the same issue with the old example:

packageVersion("report")
#> [1] '0.5.8.5'

m <- lm(mpg ~ factor(am) * factor(cyl), mtcars)
a <- car::Anova(m, type = 3)

report::report(a)
#> Type 3 ANOVAs only give sensible and informative results when covariates
#>   are mean-centered and factors are coded with orthogonal contrasts (such
#>   as those produced by `contr.sum`, `contr.poly`, or `contr.helmert`, but
#>   *not* by the default `contr.treatment`).
#> The ANOVA suggests that:
#> 
#>   - The main effect of (Intercept) is statistically significant and large (F(1,
#> 26) = 171.10, p < .001; Eta2 (partial) = 0.20, 95% CI [0.02, 1.00])
#>   - The main effect of factor(am) is statistically significant and large (F(1,
#> 26) = 6.35, p = 0.018; Eta2 (partial) = 0.41, 95% CI [0.15, 1.00])
#>   - The main effect of factor(cyl) is statistically significant and medium (F(2,
#> 26) = 9.12, p < .001; Eta2 (partial) = 0.10, 95% CI [0.00, 1.00])
#>   - The interaction between factor(am) and factor(cyl) is statistically not
#> significant and large (F(2, 26) = 1.38, p = 0.269; Eta2 (partial) = 0.20, 95%
#> CI [0.02, 1.00])
#> 
#> Effect sizes were labelled following Field's (2013) recommendations.
report::report_table(a)
#> Type 3 ANOVAs only give sensible and informative results when covariates
#>   are mean-centered and factors are coded with orthogonal contrasts (such
#>   as those produced by `contr.sum`, `contr.poly`, or `contr.helmert`, but
#>   *not* by the default `contr.treatment`).
#> Parameter              | Sum_Squares | df | Mean_Square |      F |      p | Eta2 (partial) | Eta2_partial 95% CI
#> ----------------------------------------------------------------------------------------------------------------
#> (Intercept)            |     1573.23 |  1 |     1573.23 | 171.10 | < .001 |                |                    
#> factor(am)             |       58.43 |  1 |       58.43 |   6.35 | 0.018  |           0.20 |        [0.02, 1.00]
#> factor(cyl)            |      167.71 |  2 |       83.85 |   9.12 | < .001 |           0.41 |        [0.15, 1.00]
#> factor(am):factor(cyl) |       25.44 |  2 |       12.72 |   1.38 | 0.269  |           0.10 |        [0.00, 1.00]
#> Residuals              |      239.06 | 26 |        9.19 |        |        |                |

Created on 2024-07-10 with reprex v2.1.1

rempsyc avatar Jul 10 '24 14:07 rempsyc

I am seeing this issue with version 0.6.1.

Effect sizes for the two variables in type III ANOVA generated with car::Anova are swapped if you go through report or effectsize::eta_squared.

I consider this is quite a severe bug to have gone on for so long. The goal of this package is to make it easy to generate reports. If the reports are wrong but in a way that looks right, that is worse than useless!

Let me know if you need another reproducible example, but the one above matches my experience.

tkerwin avatar May 02 '25 22:05 tkerwin

@mattansb it seems like the origin is in effectsize?

DominiqueMakowski avatar May 03 '25 08:05 DominiqueMakowski

I think he's saying it's swapped in a report in comparison to effectsize, which does match car. @tkerwin if indeed there is an issue in effectsize, please post a reprex.

mattansb avatar May 03 '25 10:05 mattansb

I see that effectsize and report give swapped results, and only for car::Anova with type 3, not type 2.

I don't think I'm seeing anything different than above, I'm just confirming and expressing dismay that this bug is still around a year later, since it gives directly misleading results.

tkerwin avatar May 05 '25 00:05 tkerwin

I see were deliberately removing the intercept - I will investigate why we did this. IIRC it was to solve this exact issue.

mattansb avatar May 05 '25 10:05 mattansb

Should be fixed on #489

some_linear_model <- lm(mpg ~ as.factor(cyl)*as.factor(am), data=mtcars) 
some_anova <- car::Anova(some_linear_model, type = "II")

report::report(some_anova)
#> The ANOVA suggests that:
#> 
#>   - The main effect of as.factor(cyl) is statistically significant and large
#> (F(2, 26) = 24.82, p < .001; Eta2 (partial) = 0.66, 95% CI [0.45, 1.00])
#>   - The main effect of as.factor(am) is statistically not significant and medium
#> (F(1, 26) = 4.00, p = 0.056; Eta2 (partial) = 0.13, 95% CI [0.00, 1.00])
#>   - The interaction between as.factor(cyl) and as.factor(am) is statistically not
#> significant and medium (F(2, 26) = 1.38, p = 0.269; Eta2 (partial) = 0.10, 95%
#> CI [0.00, 1.00])
#> 
#> Effect sizes were labelled following Field's (2013) recommendations.

report::report(some_anova, include_intercept = FALSE)
#> The ANOVA suggests that:
#> 
#>   - The main effect of as.factor(cyl) is statistically significant and large
#> (F(2, 26) = 24.82, p < .001; Eta2 (partial) = 0.66, 95% CI [0.45, 1.00])
#>   - The main effect of as.factor(am) is statistically not significant and medium
#> (F(1, 26) = 4.00, p = 0.056; Eta2 (partial) = 0.13, 95% CI [0.00, 1.00])
#>   - The interaction between as.factor(cyl) and as.factor(am) is statistically not
#> significant and medium (F(2, 26) = 1.38, p = 0.269; Eta2 (partial) = 0.10, 95%
#> CI [0.00, 1.00])
#> 
#> Effect sizes were labelled following Field's (2013) recommendations.

Created on 2025-05-05 with reprex v2.1.1

mattansb avatar May 05 '25 16:05 mattansb