gtsummary
gtsummary copied to clipboard
Feature request: Re-order rows for merged mixed models tables that include random effects component
I think when merging mixed models tables, it would make sense to sepatate the fixed from the random effects part. This is how it currently looks like in gtsummary (you see that residual SD is at the bottom for the first model, but the first "coefficient" for the second model, which is confusing as well):
library(lme4)
#> Loading required package: Matrix
library(gtsummary)
data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)
m2 <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
tbl_merge(
list(
tbl_regression(m1, tidy_fun = broom.mixed::tidy),
tbl_regression(m2, tidy_fun = broom.mixed::tidy)
)
)
Characteristic | Table 1 | Table 2 | ||
---|---|---|---|---|
Beta | 95% CI1 | Beta | 95% CI1 | |
carer' age | 0.00 | -0.03, 0.02 | ||
carer's gender | 0.47 | -0.09, 1.0 | ||
elder's dependency | ||||
independent | — | — | ||
slightly dependent | 1.3 | 0.28, 2.2 | ||
moderately dependent | 2.6 | 1.7, 3.6 | ||
severely dependent | 4.3 | 3.3, 5.2 | ||
cluster.sd__(Intercept) | 0.70 | |||
Residual.sd__Observation | 3.6 | 26 | ||
Days | 10 | 7.4, 13 | ||
Subject.sd__(Intercept) | 25 | |||
Subject.cor__(Intercept).Days | 0.07 | |||
Subject.sd__Days | 5.9 | |||
1 CI = Confidence Interval |
Created on 2023-11-06 with reprex v2.0.2
Would be nice if it would look something like shown here: https://strengejacke.github.io/sjPlot/articles/tab_mixed.html#mixed-models-summaries-as-html-table
Is this possible and something you would consider?
Thank you @strengejacke ! Yes, this is possible, but honestly, it's not a cute solution. The merging is a generic function for any gtsummary table (mostly commonly not regression summaries) and doesn't make distinctions between the fixed and random effects of the model. To get all the fixed effects together, we'd just re-arrange the rows in the underlying data frame.
library(gtsummary)
library(lme4)
#> Loading required package: Matrix
data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)
m2 <- lmer(Reaction ~ Days + (1 + Days | Subject), data = sleepstudy)
tbl <-
tbl_merge(
list(
tbl_regression(m1, tidy_fun = broom.mixed::tidy),
tbl_regression(m2, tidy_fun = broom.mixed::tidy)
)
) |>
# move random components to the bottom of the table (those labels always contain "__")
modify_table_body(
\(x) x |> dplyr::arrange(grepl(pattern = "__", x = label))
)
Created on 2023-11-06 with reprex v2.0.2
But an update that would group and keep the estimates together would be great, and something I think @larmarange has been thinking about (i.e. the multicomponent models). Joseph, let's connect soon and discuss some potential upgrades!
As a follow (or I can open another issue, if you like), I was looking for "more readable" names of the random effects, maybe similar to what parameters::model_parameters()
returns for mixed models. I realize there's a tidy_fun
argument, but it seems to be working from broom::tidy()
only, or maybe it's rather for "internal" use - or am I missing something?
library(gtsummary)
library(lme4)
#> Loading required package: Matrix
data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)
tbl_regression(m1, tidy_fun = parameters::model_parameters)
#> ✖ There was an error calling `tidy_fun()`. Most likely, this is because the
#> function supplied in `tidy_fun=` was misspelled, does not exist, is not
#> compatible with your object, or was missing necessary arguments (e.g. `conf.level=` or `conf.int=`). See error message below.
#> Error:
#> ! Error in model_parameters.merMod(x = new("lmerMod", resp =
#> new("lmerResp", : argument "model" is missing, with no default
#> Backtrace:
#> ▆
#> 1. ├─gtsummary::tbl_regression(m1, tidy_fun = parameters::model_parameters)
#> 2. ├─gtsummary:::tbl_regression.lmerMod(m1, tidy_fun = parameters::model_parameters)
#> 3. │ └─gtsummary:::tbl_regression.default(...)
#> 4. │ └─gtsummary:::tidy_prep(...)
#> 5. │ └─... %>% ...
#> 6. ├─rlang::eval_tidy(.)
#> 7. ├─broom.helpers::tidy_plus_plus(...)
#> 8. │ └─model %>% ...
#> 9. └─broom.helpers::tidy_and_attach(...)
#> 10. └─base::tryCatch(...)
#> 11. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 12. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 13. └─value[[3L]](cond)
#> 14. └─base::tryCatch(...)
#> 15. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 16. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 17. └─value[[3L]](cond)
#> 18. └─cli::cli_abort(as.character(e), call = NULL)
#> 19. └─rlang::abort(...)
tbl_regression(m1, tidy_fun = function(x, ...) parameters::model_parameters(model = x, ...))
#> Error in `dplyr::left_join()`:
#> ! Join columns in `x` must be present in the data.
#> ✖ Problem with `term`.
#> Backtrace:
#> ▆
#> 1. ├─gtsummary::tbl_regression(...)
#> 2. ├─gtsummary:::tbl_regression.lmerMod(...)
#> 3. │ └─gtsummary:::tbl_regression.default(...)
#> 4. │ └─gtsummary:::tidy_prep(...)
#> 5. │ └─... %>% ...
#> 6. ├─rlang::eval_tidy(.)
#> 7. ├─broom.helpers::tidy_plus_plus(...)
#> 8. │ └─res %>% tidy_identify_variables(quiet = quiet) %>% ...
#> 9. ├─broom.helpers::tidy_add_contrasts(.)
#> 10. │ └─broom.helpers::tidy_get_model(x)
#> 11. ├─broom.helpers::tidy_identify_variables(., quiet = quiet)
#> 12. │ └─x %>% dplyr::left_join(variables_list, by = "term")
#> 13. ├─dplyr::left_join(., variables_list, by = "term")
#> 14. └─dplyr:::left_join.data.frame(., variables_list, by = "term")
#> 15. └─dplyr:::join_mutate(...)
#> 16. └─dplyr:::join_cols(...)
#> 17. └─dplyr:::check_join_vars(by$x, x_names, by$condition, "x", error_call = error_call)
#> 18. └─rlang::abort(bullets, call = error_call)
tbl_regression(m1, tidy_fun = function(x, ...) insight::standardize_names(parameters::model_parameters(model = x, ...), style = "broom"))
#> Error in if (result$label %in% c("Beta", "exp(Beta)")) {: argument is of length zero
Created on 2023-11-06 with reprex v2.0.2
I would have expected the last call to run without error. tbl_regression()
often uses the parameters package to do the tidying with a wrapper that ultimately does something very similar to what you have. Do you mind posting the reprex there? Or I can do it later as well.
library(gtsummary)
library(lme4)
#> Loading required package: Matrix
data("sleepstudy")
data("efc", package = "ggeffects")
efc$e42dep <- datawizard::to_factor(efc$e42dep)
efc$cluster <- as.factor(efc$e15relat)
m1 <- lmer(neg_c_7 ~ c160age + c161sex + e42dep + (1 | cluster), data = efc)
tbl_regression(m1, tidy_fun = broom.helpers::tidy_parameters)
#> Error in if (result$label %in% c("Beta", "exp(Beta)")) {: argument is of length zero
Created on 2023-11-06 with reprex v2.0.2
Hi.
After exploring a little, I have the feeling that there are two areas of discussion:
- improvement of
broom.helpers
for mixed models - more global discussion about a grouped structure for
tbl_regression()
@strengejacke for your information, tbl_regression()
relies on broom.helpers::tidy_plus_plus()
which is in charge of tidying the results of the model and adding additional information (like identification of the variable name, level name for categorical factors, identifying the type of terms, etc.).
Regarding mixed models, by default, tbl_regression()
use broom.mixed::tidy()
as tidier function and force effects = "mixed"
, cf. https://github.com/ddsjoberg/gtsummary/blob/30ef7175b9cc935ddcfd41f7c9c2418f3ee52cd8/R/tbl_regression_methods.R#L89
The way it is currently implemented is problematic because it does not check if the effects
is part of ...
. Therefore, a syntax like tbl_regression(m2, effects = "all")
will fails.
The default tidier used by tidy_plus_plus()
is tidy_with_broom_or_parameters()
, cf. https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/custom_tidiers.R#L81
For mixed models, it is charging broom.mixed
and use tidy()
by default. By default, tidy_with_broom_or_parameters()
returns both fixed and random effects. We could update broom.helpers
to return only fixed effects by default unless effects
is passed. In that case, the specific methods defined for tbl_regression()
in gtsummary
could be removed (it would simplify the code).
Here, is the structure of the tibble returned by tidy()
> tidy(m2)
# A tibble: 6 × 6
effect group term estimate std.error statistic
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 fixed NA (Intercept) 251. 6.82 36.8
2 fixed NA Days 10.5 1.55 6.77
3 ran_pars Subject sd__(Intercept) 24.7 NA NA
4 ran_pars Subject cor__(Intercept).Days 0.0656 NA NA
5 ran_pars Subject sd__Days 5.92 NA NA
6 ran_pars Residual sd__Observation 25.6 NA NA
The type of effect (fixed or random) is stored in the effect
column. Additionnal information is stored in group.
There has been discussion about terms disambiguation (because in some cases two identical values are used for terms
column): https://github.com/larmarange/broom.helpers/pull/90 & https://github.com/larmarange/broom.helpers/issues/98
Therefore, tidy_plus_plus()
disambiguate terms, by prefixed terms with the value of group
.
> tidy_plus_plus(m2) |> select(term, original_term, var_type, effect, group, var_label)
# A tibble: 5 × 6
term original_term var_type effect group var_label
<chr> <chr> <chr> <chr> <chr> <chr>
1 Days Days continuous fixed NA Days
2 Subject.sd__(Intercept) Subject.sd__(Intercept) ran_pars ran_pars Subject Subject.sd__(Intercept)
3 Subject.cor__(Intercept).Days Subject.cor__(Intercept).Days ran_pars ran_pars Subject Subject.cor__(Intercept).Days
4 Subject.sd__Days Subject.sd__Days ran_pars ran_pars Subject Subject.sd__Days
5 Residual.sd__Observation Residual.sd__Observation ran_pars ran_pars Residual Residual.sd__Observation
Random effects are also identified as such in var_type
.
So far, there is no additional work done regarding var_label
. Labelling could eventually be improved in broom.helpers
. But it will require to clearly define that would be expected with different scenarios. I'm open to any suggestion.
Regarding model_parameters
, we cannot use directly model_parameters()
because the column names are inconsistent with tidy()
> parameters::model_parameters(m2) |> as_tibble()
# A tibble: 6 × 11
Parameter Coefficient SE CI CI_low CI_high t df_error p Effects Group
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <chr> <chr>
1 (Intercept) 251. 6.82 0.95 238. 265. 36.8 174 4.37e-84 fixed ""
2 Days 10.5 1.55 0.95 7.42 13.5 6.77 174 1.88e-10 fixed ""
3 SD (Intercept) 24.7 5.84 0.95 15.6 39.3 NA NA NA random "Subject"
4 SD (Days) 5.92 1.25 0.95 3.92 8.95 NA NA NA random "Subject"
5 Cor (Intercept~Days) 0.0656 0.319 0.95 -0.509 0.600 NA NA NA random "Subject"
6 SD (Observations) 25.6 1.51 0.95 22.8 28.7 NA NA NA random "Residual"
Instead, you could used broom.helpers::tidy_parameters()
who proceed to name standardisation. cf. https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/custom_tidiers.R#L20
> tidy_parameters(m2)
term estimate std.error conf.level conf.low conf.high statistic df.error p.value effect group
1 (Intercept) 251.40510485 6.8245967 0.95 237.9354568 264.8747529 36.838090 174 4.372791e-84 fixed
2 Days 10.46728596 1.5457896 0.95 7.4163742 13.5181977 6.771481 174 1.882719e-10 fixed
3 SD (Intercept) 24.74065799 5.8362639 0.95 15.5816976 39.2832779 NA NA NA random Subject
4 SD (Days) 5.92213766 1.2480355 0.95 3.9182819 8.9507889 NA NA NA random Subject
5 Cor (Intercept~Days) 0.06555124 0.3185881 0.95 -0.5090677 0.5997529 NA NA NA random Subject
6 SD (Observations) 25.59179572 1.5080110 0.95 22.8004400 28.7248846 NA NA NA random Residual
Three comments. (1) for random effects, term names are not consistent with tidy()
and it would require to be taken into account if we plan to improve variable labelling; (ii) the effect column is equal to "random" instead of "ran_pars". It is not ney taken into account and should be considered in https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/tidy_identify_variables.R#L90 (3) for fixed terms, group contains an empty character instead of NA
, resulting in the current error when trying to use tidy_parameters()
with ybl_regression()
. change to be fixed at https://github.com/larmarange/broom.helpers/blob/637a9b095d02d4263dfac2795099705f938cbc12/R/tidy_disambiguate_terms.R#L56
In summary there is room for improvement regarding labelling random effects in broom.helpers
(but expected results should be clarified); some bugs should be fixed in broom.helpers
for allowing the use of tidy_parameters()
; and we could better harmonized default tidier between gtsummary
and broom.helpers
for mixed models.
Regarding taking into account groups of terms in tbl_regression()
Regarding the initial question about merging, it is part of a more global discussion on how to deal with groups of terms with tbl_regression()
, cf. #1540
- Mixed models return two types of terms (fixed and random) identified by the
effect
column. - Multinomial models also return several group of terms, identified by the
y.level
column. - Betareg and zero-inflated models also return several terms groups, identified by the
component
column
So far, these terms groups are not handled by tbl_regression()
, except for multinomial models, where they are listed in a long form. There is no native support for a wide tabulation, although an experimental multinom_pivot_wider()
has been proposed, cf. #1552
To be more generic, we could consider supporting "terms groups" or "components" (term to be discussed) within tbl_regression()
:
- should the column identifying a group of terms be passed by the user? alternatively, we could let
broom.helpers
populate aterms_group
or acomponent
column, uniformising the results for mixed models, multinomial models and multi-components models. A sub-question would be to also produce proper labels (and their translation). - if the tidy tibble contains such a
component
column, thentbl_regression()
should display the name of the component (as it is currently done for multinomial models) - if merging two tbl, taking into account the
component
column and reordering, if needed, by component. - the structure of the result should allow to identify such row to allow a function such as
bold_component_labels()
- ideally, incorporate a
pivot_wider()
function for displaying tbl in a wide format
Thanks for the answers! Just a quick comment, will read more thoroughly later:
Regarding model_parameters, we cannot use directly model_parameters() because the column names are inconsistent with tidy()
Using insight::standardize_names(<data frame>, type = "broom")
will rename columns in a data frame to follow broom-conventions (and vice versa, insight::standardize_names(<data frame>, type = "easystats")
will rename into easystats-conventions).
Using
insight::standardize_names(<data frame>, type = "broom")
will rename columns in a data frame to follow broom-conventions (and vice versa,insight::standardize_names(<data frame>, type = "easystats")
will rename into easystats-conventions).
This is exactly what tidy_parameters()
does. ;-)
We have rather simple wrappers around the gt
package to print tables to HTML, but I like the flexibility and the particular "grammar" design from gtsummary, that's why I'm asking. Not sure if this helps you to give some "inspiration", but this is what parameters currently does:
library(glmmTMB)
library(parameters)
model1 <- glmmTMB(
count ~ spp + mined + cover + (1 + cover | site),
ziformula = ~mined + (1 | site),
family = poisson(),
data = Salamanders
)
#> Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
#> problem; singular convergence (7). See vignette('troubleshooting'),
#> help('diagnose')
model_parameters(model1, effects = "all", component = "all") |> print_html()
Model Summary | |||||
Parameter | Coefficient | SE | 95% CI | z | p |
---|---|---|---|---|---|
Fixed Effects (Count Model) | |||||
(Intercept) | -0.06 | 0.20 | (-0.44, 0.33) | -0.28 | 0.776 |
spp (PR) | -1.28 | 0.24 | (-1.74, -0.81) | -5.36 | < .001 |
spp (DM) | 0.27 | 0.14 | (2.26e-04, 0.54) | 1.96 | 0.050 |
spp (EC-A) | -0.61 | 0.20 | (-1.01, -0.22) | -3.09 | 0.002 |
spp (EC-L) | 0.66 | 0.13 | (0.41, 0.91) | 5.12 | < .001 |
spp (DES-L) | 0.62 | 0.13 | (0.37, 0.87) | 4.93 | < .001 |
spp (DF) | 0.07 | 0.15 | (-0.22, 0.36) | 0.49 | 0.621 |
mined (no) | 1.08 | 0.18 | (0.72, 1.44) | 5.91 | < .001 |
cover | -0.15 | 0.09 | (-0.32, 0.02) | -1.71 | 0.087 |
Fixed Effects (Zero-Inflation Component) | |||||
(Intercept) | 1.23 | 0.35 | (0.55, 1.91) | 3.54 | < .001 |
mined (no) | -2.48 | 0.47 | (-3.40, -1.55) | -5.22 | < .001 |
Random Effects Variances | |||||
SD (Intercept: site) | 0.11 | ||||
SD (cover: site) | 0.20 | ||||
Cor (Intercept~cover: site) | -1.00 | ||||
Random Effects (Zero-Inflation Component) | |||||
SD (Intercept: site) | 0.85 | ||||
model1 <- glmmTMB(
count ~ spp + mined + cover + (1 + cover | site),
ziformula = ~mined + (1 | site),
family = poisson(),
data = Salamanders
)
model2 <- glmmTMB(
count ~ spp + mined + DOP + (1 | site),
ziformula = ~ mined + DOP + (1 + DOP | site),
family = poisson(),
data = Salamanders
)
mp1 <- model_parameters(model1, effects = "all", component = "all", ci_random = 0.95)
mp2 <- model_parameters(model2, effects = "all", component = "all", ci_random = 0.95)
compare_parameters(list(mp1, mp2)) |> print_html()
Parameter | Model 1 | Model 2 |
---|---|---|
Fixed Effects | ||
(Intercept) |
-0.06 |
-0.16 |
spp (PR) |
-1.28 |
-1.26 |
spp (DM) |
0.27 |
0.27 |
spp (EC-A) |
-0.61 |
-0.60 |
spp (EC-L) |
0.66 |
0.67 |
spp (DES-L) |
0.62 |
0.62 |
spp (DF) |
0.07 |
0.08 |
mined (no) |
1.08 |
1.07 |
cover |
-0.15 |
|
DOP |
0.05 | |
Fixed Effects (Zero-Inflation Component) | ||
(Intercept) |
1.23 |
1.10 |
minedno |
-2.48 |
-2.30 |
DOP |
0.20 | |
Random Effects | ||
SD (Intercept: site) |
0.11 |
0.27 |
Cor (Intercept~cover: site) |
-1.00 |
|
SD (cover: site) |
0.20 |
|
Random Effects (Zero-Inflation Component) | ||
SD (Intercept: site) |
0.85 |
0.82 |
SD (DOP: site) |
0.11 | |
Cor (Intercept~DOP: site) |
-1.00 | |
Created on 2023-11-07 with reprex v2.0.2
Three comments. (1) for random effects, term names are not consistent with tidy() and it would require to be taken into account if we plan to improve variable labelling; (ii) the effect column is equal to "random" instead of "ran_pars". It is not ney taken into account and should be considered in
Yes, I didn't mean to use parameters or label names, this was just an example how labels could look like. The original term names returned by tidy()
can be preserved, maybe just have a function that "cleans" labels during printing?
Quick note: https://github.com/larmarange/broom.helpers/pull/239 fixes the bug with tidy_parameters()
and mixed models.
Regarding the question of variable labels for random effects: a big feature of tbl_regression()
and tidy_plus_plus()
for classic models is the identification of the variables, the type of contrasts and reference level for categorical variables, the identification of interactions and the generation of proper and readable labels.
The same level of work and generation of readable labels is not done yet for the random effects. This is an open question on how to better such random effects and generate proper labels.
This is great! @strengejacke the tables are gorgeous.
-
@larmarange we could change the default tidier in
tbl_regression.lmerMod()
to be from the parameters package to rely more heavily on the great work already done there. -
When the lmerMod
tbl_regression()
method was first written (many years ago), I made this the default tidying function:tidy_fun = function(x, ...) broom.mixed::tidy(x, ..., effects = "fixed")
. Since then, we've begun passing the...
tobroom.helpers::tidy_plus_plus()
, which in turn passes those arguments to the tidier. We can now update this totbl_regression(x, tidy_fun = broom.mixed::tidy, effects = "fixed")
(or the analogue for the parameters tidier) to make it easier for users to control which regression results are included in the resulting table. -
Certainly, the multicomponent piece needs some thought to ensure
tbl_merge()
continues to work smoothly with non-regression summaries table merging, regression to typical summary table merging, and regression model on regression model merging. But we can clearly make an improvement on the current behaviour