gtsummary
gtsummary copied to clipboard
Feature request: unweighted missing N in separate row
Hello - I recently came across gtsummary
and am in love with how powerful it is and how much easier it can make reproducible research!
One issue I've come across in dealing with weighted data is the inability in tbl_svysummary()
to create a separate row for n missing that is unweighted. It seems like you can do this within a row of a variable, for example by specifying statistic = list(all_categorical() ~ "{N_miss_unweighted})"
but it would be great to be able to do this in a separate row in the variable.
Here is an issue on StackOverflow that seems to be related.
Hopefully this is easy to address, or is easy to find a workaround for. Thanks in advance!
Thank you for the post. Can you please include a mock-up of the table you're looking for including data and code I can use. Please keep any code examples as short/minimal as possible.
Thanks for your quick response! Here's an example expanding on the one in the tbl_svysummary()
overview on p. 111 of the the package documentation:
data(api, package = "survey")
tbl_svysummary_ex2 <-
survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) %>%
tbl_svysummary(by = "both", include = c(api00, stype, avg.ed))
The above code chunk returns the following:
The table above indicates that the weighted number of missing observations within avg.ed
across "No" and "Yes" strata of both
is 237 and 643, respectively. However, if we count the (unweighted) number of missing observations by running table(apiclus1$both[is.na(apiclus1$avg.ed)])
we get No = 7 and Yes = 19. I'd like my table to show these numbers, but in the separate "unknown" row like the table image above, not like the one below (which does show unweighted n for missing, just squished together in the same cell as the other statistics)

Thank you again!
I think the easiest way to add this is to create a new variable that indicates if the variable of interest is missing or not. Place the new variable after the original variable it's associated with.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0'
data(api, package = "survey")
svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
svy$variables <-
svy$variables %>%
mutate(avg.ed.is.na = is.na(avg.ed))
svy %>%
tbl_svysummary(
by = "both",
include = c(api00, stype, avg.ed, avg.ed.is.na),
statistic = avg.ed.is.na ~ "{n_unweighted}",
label = avg.ed.is.na ~ "Unknown (Unweighted)",
digits = avg.ed.is.na ~ 0
) %>%
modify_column_indent(columns = label, rows = variable == "avg.ed.is.na") %>%
as_kable()
Characteristic | No, N = 1,692 | Yes, N = 4,502 |
---|---|---|
api00 | 631 (556, 710) | 654 (551, 722) |
stype | ||
E | 1,083 (64%) | 3,791 (84%) |
H | 237 (14%) | 237 (5.3%) |
M | 372 (22%) | 474 (11%) |
avg.ed | 2.74 (2.35, 3.05) | 2.60 (2.09, 3.02) |
Unknown | 237 | 643 |
Unknown (Unweighted) | 7 | 19 |
Created on 2022-05-20 by the reprex package (v2.0.1)
Makes sense - this is a good workaround for now, thanks for providing a quick fix! I think it would still be a great feature at some point in the future though. Appreciate it!
On Fri, May 20, 2022 at 6:19 PM Daniel Sjoberg @.***> wrote:
I think the easiest way to add this is to create a new variable that indicates if the variable of interest is missing or not. Place the new variable after the original variable it's associated with.
library(gtsummary) packageVersion("gtsummary")#> [1] '1.6.0'
data(api, package = "survey") svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) svy$variables <- svy$variables %>% mutate(avg.ed.is.na = is.na(avg.ed)) svy %>% tbl_svysummary( by = "both", include = c(api00, stype, avg.ed, avg.ed.is.na), statistic = avg.ed.is.na ~ "{n_unweighted}", label = avg.ed.is.na ~ "Unknown (Unweighted)", digits = avg.ed.is.na ~ 0 ) %>% modify_column_indent(columns = label, rows = variable == "avg.ed.is.na") %>% as_kable()
Characteristic No, N = 1,692 Yes, N = 4,502 api00 631 (556, 710) 654 (551, 722) stype E 1,083 (64%) 3,791 (84%) H 237 (14%) 237 (5.3%) M 372 (22%) 474 (11%) avg.ed 2.74 (2.35, 3.05) 2.60 (2.09, 3.02) Unknown 237 643 Unknown (Unweighted) 7 19
Created on 2022-05-20 by the reprex package https://reprex.tidyverse.org (v2.0.1)
— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1251#issuecomment-1133492768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6E2WZ26C5LTWFTMRSZTFTVLA235ANCNFSM5WM6KTYQ . You are receiving this because you authored the thread.Message ID: @.***>
--
Tracy Lam-Hine, MBA (he/他) Doctoral Candidate, UC Berkeley School of Public Health xučyun (Huichin) Ohlone Land @.*** | read.cv/lamhine
Maybe an option like stats_unknown could be added to customize the way unknown are displayed?
Joseph Larmarange
Le sam. 21 mai 2022 à 04:05, Tracy Lam-Hine @.***> a écrit :
Makes sense - this is a good workaround for now, thanks for providing a quick fix! I think it would still be a great feature at some point in the future though. Appreciate it!
On Fri, May 20, 2022 at 6:19 PM Daniel Sjoberg @.***> wrote:
I think the easiest way to add this is to create a new variable that indicates if the variable of interest is missing or not. Place the new variable after the original variable it's associated with.
library(gtsummary) packageVersion("gtsummary")#> [1] '1.6.0'
data(api, package = "survey") svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) svy$variables <- svy$variables %>% mutate(avg.ed.is.na = is.na(avg.ed)) svy %>% tbl_svysummary( by = "both", include = c(api00, stype, avg.ed, avg.ed.is.na), statistic = avg.ed.is.na ~ "{n_unweighted}", label = avg.ed.is.na ~ "Unknown (Unweighted)", digits = avg.ed.is.na ~ 0 ) %>% modify_column_indent(columns = label, rows = variable == "avg.ed.is.na") %>% as_kable()
Characteristic No, N = 1,692 Yes, N = 4,502 api00 631 (556, 710) 654 (551, 722) stype E 1,083 (64%) 3,791 (84%) H 237 (14%) 237 (5.3%) M 372 (22%) 474 (11%) avg.ed 2.74 (2.35, 3.05) 2.60 (2.09, 3.02) Unknown 237 643 Unknown (Unweighted) 7 19
Created on 2022-05-20 by the reprex package < https://reprex.tidyverse.org> (v2.0.1)
— Reply to this email directly, view it on GitHub < https://github.com/ddsjoberg/gtsummary/issues/1251#issuecomment-1133492768 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB6E2WZ26C5LTWFTMRSZTFTVLA235ANCNFSM5WM6KTYQ
. You are receiving this because you authored the thread.Message ID: @.***>
--
Tracy Lam-Hine, MBA (he/他) Doctoral Candidate, UC Berkeley School of Public Health xučyun (Huichin) Ohlone Land @.*** | read.cv/lamhine
— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1251#issuecomment-1133507005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL5I7HURUA6W6B55SHU7TVLBAHNANCNFSM5WM6KTYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I separate row is trickier. But if you are using the dev version of the package, you can add a theme to change the missing statistic. You can report both the weighted and unweighted counts in the same cell.
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0.9009'
# set statistic for missing row
list("tbl_summary-str:missing_stat" = ("Weighted: {N_miss}\nUnweighted: {N_miss_unweighted}")) %>%
set_gtsummary_theme()
tbl <-
survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) %>%
tbl_svysummary(
by = "both",
include = avg.ed,
)
Created on 2022-05-21 by the reprex package (v2.0.1)
If missing_stat
could be changed through a theme, should we propose a missing_stat
argument in tbl_summary()
and tbl_svysummary()
? The option will be more visible in the doc. In addition, changing a theme could be difficult for some users.
^ if this were possible, it would be a very ideal solution! I do a lot of my work with secured PHI data on a VM that is disconnected from the internet, so installing a dev version of a package is usually a no-go (I bet others may be in similar situations as well)
Thanks for the input! @larmarange it's a good solution, but I need time to sit with it...you know how I hate adding additional arguments 😆
UPDATE: I've been thinking and we can add the argument. It'll take a lot of updates (passing the argument to many many functions within functions). It's not the top of my list, but I'll to it eventually.
@larmarange @lamhine
Thanks @ddsjoberg
Great to hear!
FYI, i had slated this update to occur for the v1.6.2 release, but I won't have the time to dedicate to it before the next release.