equatiomatic
equatiomatic copied to clipboard
replacing `{broom}` and `{broom.mixed}` tidiers with `{parameters}` package to reduce no. of dependencies
Before making a PR related to this, I was wondering if you would be open to this. If you agree, I will open a PR.
rationale
parameters
(https://easystats.github.io/parameters/) has way fewer dependencies and can handle pretty much every model that broom
and broom.mixed
combined support. It offers a number of other additional features not in broom
(e.g., robust SEs, standardization, etc.)
dependency calculations
tools::package_dependencies(c("broom", "broom.mixed", "parameters"), recursive = TRUE)
#> $broom
#> [1] "backports" "dplyr" "ellipsis" "generics" "glue"
#> [6] "methods" "purrr" "rlang" "stringr" "tibble"
#> [11] "tidyr" "ggplot2" "lifecycle" "magrittr" "R6"
#> [16] "tidyselect" "utils" "vctrs" "pillar" "digest"
#> [21] "grDevices" "grid" "gtable" "isoband" "MASS"
#> [26] "mgcv" "scales" "stats" "withr" "stringi"
#> [31] "fansi" "pkgconfig" "cpp11" "graphics" "nlme"
#> [36] "Matrix" "splines" "cli" "crayon" "utf8"
#> [41] "farver" "labeling" "munsell" "RColorBrewer" "viridisLite"
#> [46] "tools" "lattice" "colorspace"
#>
#> $broom.mixed
#> [1] "broom" "coda" "dplyr" "methods" "nlme"
#> [6] "purrr" "stringr" "tibble" "tidyr" "backports"
#> [11] "ellipsis" "generics" "glue" "rlang" "ggplot2"
#> [16] "lattice" "lifecycle" "magrittr" "R6" "tidyselect"
#> [21] "utils" "vctrs" "pillar" "graphics" "stats"
#> [26] "stringi" "fansi" "pkgconfig" "cpp11" "grDevices"
#> [31] "digest" "grid" "gtable" "isoband" "MASS"
#> [36] "mgcv" "scales" "withr" "cli" "crayon"
#> [41] "utf8" "tools" "Matrix" "splines" "farver"
#> [46] "labeling" "munsell" "RColorBrewer" "viridisLite" "colorspace"
#>
#> $parameters
#> [1] "bayestestR" "datawizard" "insight" "graphics" "methods"
#> [6] "stats" "utils"
Created on 2021-11-03 by the reprex package (v2.0.1)
example with merMod
library(lme4)
#> Loading required package: Matrix
library(magrittr)
library(parameters)
lmer_mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
broom.mixed::tidy(lmer_mod, effects = "fixed")
#> # A tibble: 2 x 5
#> effect term estimate std.error statistic
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 fixed (Intercept) 251. 6.82 36.8
#> 2 fixed Days 10.5 1.55 6.77
parameters::standardize_names(parameters::model_parameters(lmer_mod), style = "broom") %>%
tibble::as_tibble()
#> # A tibble: 2 x 9
#> term estimate std.error conf.level conf.low conf.high statistic df.error
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Int… 251. 6.82 0.95 238. 265. 36.8 174
#> 2 Days 10.5 1.55 0.95 7.44 13.5 6.77 174
#> # … with 1 more variable: p.value <dbl>
example with lm
lm_mod <- lm(Reaction ~ Days, sleepstudy)
broom::tidy(lm_mod)
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 251. 6.61 38.0 2.16e-87
#> 2 Days 10.5 1.24 8.45 9.89e-15
parameters::standardize_names(parameters::model_parameters(lm_mod), style = "broom") %>%
tibble::as_tibble()
#> # A tibble: 2 x 9
#> term estimate std.error conf.level conf.low conf.high statistic df.error
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 (Int… 251. 6.61 0.95 238. 264. 38.0 178
#> 2 Days 10.5 1.24 0.95 8.02 12.9 8.45 178
#> # … with 1 more variable: p.value <dbl>
Created on 2021-02-18 by the reprex package (v1.0.0)
I like the general idea but this would be a massive change and I'm not sure it's worth it. A lot of the current codebase depends on the output from broom looking exactly as it does now, so it would require considerable refactoring. For example, the lme4::lmer()
code depends on having the effect column to delineate between fixed and random effects.
The other thing that worries me a little bit is just that broom is a really established package with considerable support around maintaining it. I've never really looked into parameters. It looks like it's pretty well maintained too. But it would still worry me a bit.
So I guess I'm leaning toward no thanks, but I'm happy to engage in the conversation a bit more.
For example, the lme4::lmer() code depends on having the effect column to delineate between fixed and random effects.
Hmm, that's a fair point. This is indeed a context where the parameters
output won't exactly line up with the broom.mixed
output, and this is a good enough reason to currently not make this switch.
The other thing that worries me a little bit is just that broom is a really established package with considerable support around maintaining it.
As someone who has contributed to both of these packages, I can vouch for the rigor and speed at which parameters
is maintained (it is < 2 years old and already supports more models than broom
and broom.mixed
combined) and, in a few years, it will be as well-established as broom
was at its age. 😉
So I guess I'm leaning toward no thanks, but I'm happy to engage in the conversation a bit more.
We can revisit this when parameters
starts to behave the same way as broom.mixed
when it comes to random effects. Since then the switch would require minimal refactoring.
Sounds good to me. Thanks.
The outputs for mixed-effects models from parameters
(GitHub version) now also line up with broom.mixed
output, with a few differences in naming schemas for term
s, but that should be easy to adjust to.
No pressure at all to take this further; just wanted to log where things start right now. 🙂
library(lme4)
library(broom.mixed)
library(tibble)
library(parameters)
options(tibble.width = Inf)
mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
# `broom.mixed` output --------------------------------
tidy(mod)
#> # A tibble: 6 x 6
#> effect group term estimate std.error statistic
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 fixed <NA> (Intercept) 251. 6.82 36.8
#> 2 fixed <NA> Days 10.5 1.55 6.77
#> 3 ran_pars Subject sd__(Intercept) 24.7 NA NA
#> 4 ran_pars Subject cor__(Intercept).Days 0.0656 NA NA
#> 5 ran_pars Subject sd__Days 5.92 NA NA
#> 6 ran_pars Residual sd__Observation 25.6 NA NA
# `parameters` output ---------------------------------
# (with further modications to match `broom` conventions)
model_parameters(mod, effects = "all") %>%
standardize_names(style = "broom") %>%
as_tibble()
#> # A tibble: 6 x 11
#> term estimate std.error conf.level conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 251. 6.82 0.95 238. 265.
#> 2 Days 10.5 1.55 0.95 7.44 13.5
#> 3 SD (Observations) 25.6 NA 0.95 NA NA
#> 4 SD (Intercept) 24.7 NA 0.95 NA NA
#> 5 SD (Days) 5.92 NA 0.95 NA NA
#> 6 Cor (Intercept~Days) 0.256 NA 0.95 NA NA
#> statistic df.error p.value effect group
#> <dbl> <int> <dbl> <chr> <chr>
#> 1 36.8 174 4.54e-297 fixed ""
#> 2 6.77 174 1.27e- 11 fixed ""
#> 3 NA NA NA random "Residual"
#> 4 NA NA NA random "Subject"
#> 5 NA NA NA random "Subject"
#> 6 NA NA NA random "Subject"
Created on 2021-03-05 by the reprex package (v1.0.0)
Okay, I appreciate it. I'm hoping to come back to work on some bugs and things here in the next couple weeks. I suppose we could use the GitHub version as a dependency for now and then wait until they push to CRAN before our next release.
Just wanted to post another reprex, this time with CRAN versions of both packages.
As far as I can see, there are just two (IMO) minor differences, but not sure how much difference it makes to your code:
- random effects are called
ran_pars
in{broom}
, whilerandom
in{parameters}
-
group
column strings are surrounded in""
library(lme4)
#> Loading required package: Matrix
library(broom.mixed)
library(tibble)
library(parameters)
options(tibble.width = Inf)
mod <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
# `broom.mixed` output --------------------------------
tidy(mod)
#> # A tibble: 6 x 6
#> effect group term estimate std.error statistic
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 fixed <NA> (Intercept) 251. 6.82 36.8
#> 2 fixed <NA> Days 10.5 1.55 6.77
#> 3 ran_pars Subject sd__(Intercept) 24.7 NA NA
#> 4 ran_pars Subject cor__(Intercept).Days 0.0656 NA NA
#> 5 ran_pars Subject sd__Days 5.92 NA NA
#> 6 ran_pars Residual sd__Observation 25.6 NA NA
# `parameters` output ---------------------------------
# (with further modications to match `broom` conventions)
model_parameters(mod, effects = "all") %>%
standardize_names(style = "broom") %>%
as_tibble()
#> # A tibble: 6 x 11
#> term estimate std.error conf.level conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 251. 6.82 0.95 238. 265.
#> 2 Days 10.5 1.55 0.95 7.42 13.5
#> 3 SD (Intercept) 24.7 NA 0.95 NA NA
#> 4 SD (Days) 5.92 NA 0.95 NA NA
#> 5 Cor (Intercept~Days: Subject) 0.0656 NA 0.95 NA NA
#> 6 SD (Observations) 25.6 NA 0.95 NA NA
#> statistic df.error p.value effect group
#> <dbl> <int> <dbl> <chr> <chr>
#> 1 36.8 174 4.37e-84 fixed ""
#> 2 6.77 174 1.88e-10 fixed ""
#> 3 NA NA NA random "Subject"
#> 4 NA NA NA random "Subject"
#> 5 NA NA NA random "Subject"
#> 6 NA NA NA random "Residual"
Created on 2021-11-03 by the reprex package (v2.0.1)
Thanks. Just to be clear, the parameters package handles the models that broom and broom.mixed handle, correct?
Yes, you can see the list of supported models using this function:
insight::supported_models()
#> [1] "aareg" "afex_aov" "AKP"
#> [4] "Anova.mlm" "aov" "aovlist"
#> [7] "Arima" "averaging" "bamlss"
#> [10] "bamlss.frame" "bayesQR" "bayesx"
#> [13] "BBmm" "BBreg" "bcplm"
#> [16] "betamfx" "betaor" "betareg"
#> [19] "BFBayesFactor" "bfsl" "BGGM"
#> [22] "bife" "bifeAPEs" "bigglm"
#> [25] "biglm" "blavaan" "blrm"
#> [28] "bracl" "brglm" "brmsfit"
#> [31] "brmultinom" "btergm" "censReg"
#> [34] "cgam" "cgamm" "cglm"
#> [37] "clm" "clm2" "clmm"
#> [40] "clmm2" "clogit" "coeftest"
#> [43] "complmrob" "confusionMatrix" "coxme"
#> [46] "coxph" "coxph.penal" "coxr"
#> [49] "cpglm" "cpglmm" "crch"
#> [52] "crq" "crqs" "crr"
#> [55] "dep.effect" "DirichletRegModel" "drc"
#> [58] "eglm" "elm" "epi.2by2"
#> [61] "ergm" "feglm" "feis"
#> [64] "felm" "fitdistr" "fixest"
#> [67] "flexsurvreg" "gam" "Gam"
#> [70] "gamlss" "gamm" "gamm4"
#> [73] "garch" "gbm" "gee"
#> [76] "geeglm" "glht" "glimML"
#> [79] "glm" "Glm" "glmm"
#> [82] "glmmadmb" "glmmPQL" "glmmTMB"
#> [85] "glmrob" "glmRob" "glmx"
#> [88] "gls" "gmnl" "HLfit"
#> [91] "htest" "hurdle" "iv_robust"
#> [94] "ivFixed" "ivprobit" "ivreg"
#> [97] "lavaan" "lm" "lm_robust"
#> [100] "lme" "lmerMod" "lmerModLmerTest"
#> [103] "lmodel2" "lmrob" "lmRob"
#> [106] "logistf" "logitmfx" "logitor"
#> [109] "LORgee" "lqm" "lqmm"
#> [112] "lrm" "manova" "MANOVA"
#> [115] "margins" "maxLik" "mclogit"
#> [118] "mcmc" "mcmc.list" "MCMCglmm"
#> [121] "mcp1" "mcp12" "mcp2"
#> [124] "med1way" "mediate" "merMod"
#> [127] "merModList" "meta_bma" "meta_fixed"
#> [130] "meta_random" "metaplus" "mhurdle"
#> [133] "mipo" "mira" "mixed"
#> [136] "MixMod" "mixor" "mjoint"
#> [139] "mle" "mle2" "mlm"
#> [142] "mlogit" "mmlogit" "model_fit"
#> [145] "multinom" "mvord" "negbinirr"
#> [148] "negbinmfx" "ols" "onesampb"
#> [151] "orm" "pgmm" "plm"
#> [154] "PMCMR" "poissonirr" "poissonmfx"
#> [157] "polr" "probitmfx" "psm"
#> [160] "Rchoice" "ridgelm" "riskRegression"
#> [163] "rjags" "rlm" "rlmerMod"
#> [166] "RM" "rma" "rma.uni"
#> [169] "robmixglm" "robtab" "rq"
#> [172] "rqs" "rqss" "Sarlm"
#> [175] "scam" "selection" "sem"
#> [178] "SemiParBIV" "semLm" "semLme"
#> [181] "slm" "speedglm" "speedlm"
#> [184] "stanfit" "stanmvreg" "stanreg"
#> [187] "summary.lm" "survfit" "survreg"
#> [190] "svy_vglm" "svyglm" "svyolr"
#> [193] "t1way" "tobit" "trimcibt"
#> [196] "truncreg" "vgam" "vglm"
#> [199] "wbgee" "wblm" "wbm"
#> [202] "wmcpAKP" "yuen" "yuend"
#> [205] "zcpglm" "zeroinfl" "zerotrunc"
Created on 2021-11-03 by the reprex package (v2.0.1)
Thanks, I'll play around with this in a bit.
Cool!
The documentation can be found here: https://easystats.github.io/parameters/
group column strings are surrounded in ""
Only in the printed output. That's because parameters uses an empty string in "group"
for fixed effects, while broom.mixed uses NA
. And for character columns, including empty strings, tibble adds a surrounding "
.
I would find this helpful--easystats is quickly becoming a huge part of my workflow and it would open up a huge number of classes to switch to {parameters} instead.