afex icon indicating copy to clipboard operation
afex copied to clipboard

Multivariate EMM results from aov_ez

Open rvlenth opened this issue 7 years ago • 15 comments

The help page for aov_ez states:

A caveat regarding the use of emmeans concerns the assumption of sphericity for ANOVAs including within-subjects/repeated-measures factors (with more than two levels). While the ANOVA tables per default report results using the Greenhousse-Geisser correction, no such correction is available when using emmeans. This may result in anti-conservative tests.

I think this is somewhat misleading, in that it is true only because the implementation of emm_basis.aov_ez relies on the aov results. It appears that aov_ez also fits a multivariate model, which is returned in the lm member of the object. If that were actually used fully, users could obtain EMMs based on the multivariate model. Those results would not have the deficiencies mentioned above.

To illustrate, here is an example from the help page for aov_ez:

data(md_12.1)
aez <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"), 
       anova_table=list(correction = "none", es = "none"))

Here is the reference grid for the lm member of the object:

ref_grid(aez$lm)
## 'emmGrid' object with variables:
##     1 = 1
##     rep.meas = multivariate response levels: X0_absent, X0_present, X4_absent, X4_present, X8_absent, X8_present

So lets use the mult.names option to sort these out:

aez.mult <- ref_grid(aez$lm, 
    mult.levs = list(noise = c("absent", "present"), 
                     angle = c("X0", "X4", "X8")))

Now compare the results:

emmeans(aez.mult, ~ noise * angle)
## noise   angle emmean       SE df lower.CL upper.CL
## absent  X0       462 18.00000  9 421.2812 502.7188
## present X0       492 28.00000  9 428.6596 555.3404
## absent  X4       510 27.20294  9 448.4627 571.5373
## present X4       660 34.64102  9 581.6366 738.3634
## absent  X8       528 24.97999  9 471.4913 584.5087
## present X8       762 36.93237  9 678.4532 845.5468
##
## Confidence level used: 0.95 

emmeans(aez, ~ noise * angle)
## noise   angle emmean       SE    df lower.CL upper.CL
## absent  X0       462 28.97125 19.79 401.5263 522.4737
## present X0       492 28.97125 19.79 431.5263 552.4737
## absent  X4       510 28.97125 19.79 449.5263 570.4737
## present X4       660 28.97125 19.79 599.5263 720.4737
## absent  X8       528 28.97125 19.79 467.5263 588.4737
## present X8       762 28.97125 19.79 701.5263 822.4737
##
## Confidence level used: 0.95 

The predictions are the same, but the SEs and df for aez.mult are obtained from the multivariate model.

afex could provide an option so that the user may choose which analysis they want. This can be done by adding, say, a mode argument to emm_basis.afex_aov, that can have values like "multivariate" (would use the lm member like in this example) and "univariate" (the current default). See current emmeans support for such as lme objects (simple) or clm (complex) to see how this may be done. I think all you need to do in this case is to call emm_basis for the lm member and set misc$ylevs to the needed levels.

rvlenth avatar Feb 19 '18 17:02 rvlenth

I have now implemented emmeans support for multivariate models in addition to the existing univariate models, basically as described here. To get the multivariate tests one needs to set model = "multivariate" in the call to emmeans() et al. The following example also shows that one can set this globally.

data(md_12.1)
aez <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))

emmeans(aez, c("angle", "noise"), model = "univariate")
#  angle noise   emmean       SE    df lower.CL upper.CL
#  X0    absent     462 28.97125 19.79 401.5263 522.4737
#  X4    absent     510 28.97125 19.79 449.5263 570.4737
#  X8    absent     528 28.97125 19.79 467.5263 588.4737
#  X0    present    492 28.97125 19.79 431.5263 552.4737
#  X4    present    660 28.97125 19.79 599.5263 720.4737
#  X8    present    762 28.97125 19.79 701.5263 822.4737
# 
# Confidence level used: 0.95 

emmeans(aez, c("angle", "noise"), model = "multivariate")
#  angle noise   emmean       SE df lower.CL upper.CL
#  X0    absent     462 18.00000  9 421.2812 502.7188
#  X4    absent     510 27.20294  9 448.4627 571.5373
#  X8    absent     528 24.97999  9 471.4913 584.5087
#  X0    present    492 28.00000  9 428.6596 555.3404
#  X4    present    660 34.64102  9 581.6366 738.3634
#  X8    present    762 36.93237  9 678.4532 845.5468
# 
# Confidence level used: 0.95 

afex_options(emmeans_model = "multivariate") # set globally
emmeans(aez, c("angle", "noise"))
#  angle noise   emmean       SE df lower.CL upper.CL
#  X0    absent     462 18.00000  9 421.2812 502.7188
#  X4    absent     510 27.20294  9 448.4627 571.5373
#  X8    absent     528 24.97999  9 471.4913 584.5087
#  X0    present    492 28.00000  9 428.6596 555.3404
#  X4    present    660 34.64102  9 581.6366 738.3634
#  X8    present    762 36.93237  9 678.4532 845.5468
# 
# Confidence level used: 0.95 

I have also changed the help page which now reads:

A caveat regarding the use of emmeans concerns the assumption of sphericity for ANOVAs including within-subjects/repeated-measures factors (with more than two levels). The current default for follow-up tests uses a univariate model (model = "univariate" in the call to emmeans), which does not adequately control for violations of sphericity. This may result in anti-conservative tests and contrasts somewhat with the default ANOVA table which reports results based on the Greenhousse-Geisser correction. An alternative is to use a multivariate model (model = "multivariate" in the call to emmeans) which should handle violations of sphericity better. The default will likely change to multivariate tests in one of the next versions of the package.

Does this solve this issue for you?

singmann avatar Feb 25 '18 15:02 singmann

Important note

In DESCRIPTION, be sure to put in the dependency emmeans (>= 1.1.2).

Testing the example

I can't get the example to run on my system:

> data(md_12.1)
> aez <- aov_ez("id", "rt", md_12.1, within = c("angle", "noise"))
Error in `$<-.data.frame`(`*tmp*`, ges, value = numeric(0)) : 
  replacement has 0 rows, data has 4

This could be a problem on my system. But I updated all packages and the problem persists. FWIW, here is my session information:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] afex_0.20-1   emmeans_1.1.2 lme4_1.1-15   Matrix_1.2-12

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15        mvtnorm_1.0-7       lattice_0.20-35     zoo_1.8-1          
 [5] digest_0.6.15       cellranger_1.1.0    plyr_1.8.4          backports_1.1.2    
 [9] acepack_1.4.1       stats4_3.4.3        coda_0.19-1         ggplot2_2.2.1      
[13] pillar_1.1.0        rlang_0.2.0         lazyeval_0.2.1      curl_3.1           
[17] multcomp_1.4-8      readxl_1.0.0        rstudioapi_0.7      minqa_1.2.4        
[21] data.table_1.10.4-3 car_3.0-0           nloptr_1.0.4        rpart_4.1-11       
[25] checkmate_1.8.5     splines_3.4.3       stringr_1.3.0       foreign_0.8-69     
[29] htmlwidgets_1.0     munsell_0.4.3       compiler_3.4.3      base64enc_0.1-3    
[33] lmerTest_2.0-36     htmltools_0.3.6     nnet_7.3-12         tibble_1.4.2       
[37] gridExtra_2.3       htmlTable_1.11.2    coin_1.2-2          Hmisc_4.1-1        
[41] rio_0.5.9           codetools_0.2-15    MASS_7.3-48         grid_3.4.3         
[45] nlme_3.1-131.1      xtable_1.8-2        gtable_0.2.0        magrittr_1.5       
[49] scales_0.5.0        stringi_1.1.6       estimability_1.3    carData_3.0-1      
[53] reshape2_1.4.3      latticeExtra_0.6-28 sandwich_2.4-0      openxlsx_4.0.17    
[57] Formula_1.2-2       TH.data_1.0-8       RColorBrewer_1.1-2  tools_3.4.3        
[61] forcats_0.3.0       parallel_3.4.3      abind_1.4-5         survival_2.41-3    
[65] yaml_2.1.16         colorspace_1.3-2    cluster_2.0.6       knitr_1.20         
[69] haven_1.1.1         modeltools_0.2-21  

(wouldn't it be nice if those namespaces were alphabetized?)

rvlenth avatar Feb 25 '18 19:02 rvlenth

Are you sure you are using the CRAN version of emmeans? This exact error only happened to me with the old version of emmeans. Once I downloaded from CRAN it disappeared.

And regarding the package version. I have simply added the corresponding check to the function (instead to the package level). If one has an older version one can still use emmeans. However, if the user wants to use multivariate tests, it fails with an appropriate error message (so given that your package version is high enough, not for you).

singmann avatar Feb 25 '18 22:02 singmann

Henrik,

Please note that the error I experienced occurred while trying to fit the model; I never got as far as calling any emmeans functions. I traced the call into aov_car, and it appears the error occurs around line 250 in the function body, in the code block:

   if (return == "afex_aov") {
   # lines omitted ...
        afex_aov$anova_table <- do.call("anova", args = c(object = list(afex_aov),  ### <- Right here ###
            observed = list(observed), anova_table))
        return(afex_aov)
    }

I traced it further into anova.afex_aov, and the error occurs on the following line (line 87 of the function body)

        es_df$ges <- tmp2$SS/(tmp2$SS + sum(unique(tmp2[, "Error SS"])) + 
            obs_SSn1 - obs_SSn2)

which tries to access non-existing columns:

Browse[3]> names(tmp2)
[1] "Sum Sq"   "num Df"   "Error SS" "den Df"   "F value"  "Pr(>F)"   "MSE" 

I tried tricking it into working by creating tmp2$SS, but then another error occurs later on.

At any rate, I'm fairly certain that the error isn't occurring in emmeans. I wonder if there are changes you haven't yet pushed to github?

rvlenth avatar Feb 26 '18 01:02 rvlenth

Hi Russ, Sorry that was my fault for not reading your message correctly. It was already late on Sunday evening for me.

The reason for this bug is that you use the development version of car (3.0). I have also downloaded it on the weekend and tried it and got the same bug with afex. They must have changed the return value of the Anova() function or something like that. I have to investigate this further, but I expect to only find the time at the end of the week.

So I guess when you downgrade it should work.

singmann avatar Feb 26 '18 11:02 singmann

Aha! Indeed, downgrading to the current CRAN version of car (rather than the development version) makes all the difference. I now get it to work as shown.

I still suggest including emmeans (>= 1.1.2) in DESCRIPTION. You are right that the checks in recover_data.aov_ez trap the situation where emmeans needs to be upgraded; but the version dependency in DESCRIPTION would cause emmeans to be automatically upgraded also when a user does install.packages("afex").

Also, it might be helpful to include a multivariate illustration in examples(aov_ez).

rvlenth avatar Feb 27 '18 01:02 rvlenth

I will definitely add more examples about the new functionality.

And I agree that automatic updating of emmeans seems like the better idea. There is not much to loose by adopting this.

I should maybe also consider demoting the dependency of emmeans from Depends to Suggests. In this case, users can use afex without needing emmeans (which again needs other packages such as multcomp which is not absolutely necessary).

singmann avatar Feb 27 '18 09:02 singmann

I have finally added the emmeans version number to the package. This afex version will be submitted to CRAN today.

Now, I only need to figure out how to provide methods for emmeans while only having it in Suggests and not in Depends. Any ideas?

singmann avatar Jun 24 '18 14:06 singmann

Henrik,

Yes. Just export the functions ref_grid.whatever and emm_basis.whatever. I’ll try to write more later but am with a lot of people.

Russ

Sent from my iPhone

On Jun 24, 2018, at 9:59 AM, Henrik Singmann <[email protected]mailto:[email protected]> wrote:

I have finally added the emmeans version number to the package. This afex version will be submitted to CRAN today.

Now, I only need to figure out how to provide methods for emmeans while only having it in Suggests and not in Depends. Any ideas?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/singmann/afex/issues/45#issuecomment-399762944, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AZiXrzus4QWM7yJYac_eMUutXpI4eYwFks5t_6k3gaJpZM4SK4on.

rvlenth avatar Jun 24 '18 15:06 rvlenth

Henrik,

I’m home now from a trip so can answer more completely...

First, remove emmeans from Imports and add to Suggests.

From NAMESPACE, remove any imports(emmeans) or importsFrom(emmeans) calls. Also remove all instances of S3method(recover_data) and S3method(emm_basis).

Instead of providing methods, simply export recover_data.myobject and emm_basis.myobject. (Note: so R checks will require you to document these; BUT it is enough to include them in \alias{} somewhere and you don’t actually have to show them in \usage{}.)

If your methods call emmeans methods, just reference emmeans – i.e., emmeans::emm_basis(...). Contrary to what you may have been led to believe, this does NOT require you to import those things from emmeans.

Also, while sometimes it is desirable to wrap things like this in if(requireNamespace(“emmeans”)) {...}, it is NOT necessary in this instance. The reason is that your emm_basis.myobject() method will never be called except when emmeans::ref_grid wants to call that method; and thus, this if() will always be TRUE.

I have gained some recent experience with all of this stuff. emmeans 1.2 had over 100 dependencies because, like a fool, I added brms to Imports. In emmeans 1.2.1, I moved brms to Suggests, and it has 48 dependencies:

tools::package_dependencies("emmeans", recursive = TRUE) $emmeans [1] "estimability" "ggplot2" "graphics" "methods" "stats" "utils" [7] "nlme" "coda" "multcomp" "plyr" "mvtnorm" "xtable" [13] "lattice" "digest" "grid" "gtable" "MASS" "reshape2" [19] "scales" "tibble" "lazyeval" "survival" "TH.data" "sandwich" [25] "codetools" "Rcpp" "grDevices" "stringr" "zoo" "RColorBrewer" [31] "dichromat" "munsell" "labeling" "R6" "viridisLite" "Matrix" [37] "splines" "cli" "crayon" "pillar" "rlang" "assertthat" [43] "colorspace" "utf8" "glue" "magrittr" "stringi" "tools"

In the next version, which I hope to submit within a day or two, I moved ggplot2, multcomp, and nlme to Suggests, and I’ve cut it down to 9:

imports = c("estimability", "graphics", "methods", "stats", "utils", "plyr", "mvtnorm", "xtable")

unique(c(imports, unlist(tools::package_dependencies(imports, recursive = TRUE))))

[1] "estimability" "graphics" "methods" "stats" "utils" "plyr"

[7] "mvtnorm" "xtable" "Rcpp"

I remember your expressing dissatisfaction once that emmeans required multcomp -- not any more!

Cheers,

Russ

From: Henrik Singmann [email protected] Sent: Sunday, June 24, 2018 9:59 AM To: singmann/afex [email protected] Cc: Lenth, Russell V [email protected]; Author [email protected] Subject: Re: [singmann/afex] Multivariate EMM results from aov_ez (#45)

I have finally added the emmeans version number to the package. This afex version will be submitted to CRAN today.

Now, I only need to figure out how to provide methods for emmeans while only having it in Suggests and not in Depends. Any ideas?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/singmann/afex/issues/45#issuecomment-399762944, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AZiXrzus4QWM7yJYac_eMUutXpI4eYwFks5t_6k3gaJpZM4SK4on.

rvlenth avatar Jun 25 '18 02:06 rvlenth

Hi Russ,

Thanks for the detailed explanation. I have tried to move it to Suggests and had the problem that R CMD check complained that the generic was not there. Good to know that I just need to export emm_basis.myobject for it to work without explicitly mentioning that it is an S3 method. I guess the same holds for recover_data.myobject.

I have just yesterday pushed a new version of afex to CRAN in which I have removed the strong dependency on coin (now only in suggests) and stringr (exchanged all corresponding calls with calls to base R functions). I plan to move emmeans from Depends to Suggests in the next update. So I will make this change in the next days and then try it out for some time before pushing it to CRAN. Great to hear that you also reduce the package dependency footprint of emmeans. Small dependency footprints are the way forward. Users can attach what they need to their workspace on their own.

Cheers, Henrik

singmann avatar Jun 25 '18 09:06 singmann

I have just pushed a version to github where emmeans is only in suggests and need to be attached explicitly either via library("emmeans") or via emmeans::emmeans(). I will test it a bit more, but will try to submit this version to CRAN this month.

singmann avatar Aug 09 '18 10:08 singmann

Henrik,

Sounds promising! I took a quick glance at your zzz.R file, and it appears that you are correctly registering the methods needed by emmeans.

Guessing that this is a hectic time for you as you prepare to move to a new position. My hopes that everything goes smoothly and you'll be quickly settled in the new place.

rvlenth avatar Aug 10 '18 20:08 rvlenth

Hi @rvlenth and @singmann,

Apologies for digging up such an ancient thread, but I've been looking into this option myself for how we handle follow-up tests in the JASP RM ANOVA. After a chat with Henrik some years ago, I added this option to JASP, but I now want to make the multivariate model the default. I cannot find much information about the underlying models/functions though - are there any resources that explain the difference between univariate/multivariate in more detail?

Thanks for your help, and for your respective packages - they make my life a lot easier =) Johnny

JohnnyDoorn avatar Feb 12 '24 12:02 JohnnyDoorn

Hi Johnny,

To be quite honest I am not sure about the proper mathematical differences between both methods. However, I had a student do some simulations on the Type I error properties recently. The first results indeed support switching to multivariate as it seems to maintain Type I errors better. As soon as I have the report I am happy to post more details.

Cheers, Henrik

singmann avatar Feb 15 '24 11:02 singmann