gtsummary Feature request: unweighted missing N in separate row

Hello - I recently came across gtsummary and am in love with how powerful it is and how much easier it can make reproducible research!

One issue I've come across in dealing with weighted data is the inability in tbl_svysummary() to create a separate row for n missing that is unweighted. It seems like you can do this within a row of a variable, for example by specifying statistic = list(all_categorical() ~ "{N_miss_unweighted})" but it would be great to be able to do this in a separate row in the variable.

Here is an issue on StackOverflow that seems to be related.

Hopefully this is easy to address, or is easy to find a workaround for. Thanks in advance!

May 19 '22 17:05 lamhine

Thank you for the post. Can you please include a mock-up of the table you're looking for including data and code I can use. Please keep any code examples as short/minimal as possible.

May 19 '22 21:05 ddsjoberg

Thanks for your quick response! Here's an example expanding on the one in the tbl_svysummary() overview on p. 111 of the the package documentation:

data(api, package = "survey")

tbl_svysummary_ex2 <-
  survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) %>%
  tbl_svysummary(by = "both", include = c(api00, stype, avg.ed))

The above code chunk returns the following:

The table above indicates that the weighted number of missing observations within avg.ed across "No" and "Yes" strata of both is 237 and 643, respectively. However, if we count the (unweighted) number of missing observations by running table(apiclus1$both[is.na(apiclus1$avg.ed)]) we get No = 7 and Yes = 19. I'd like my table to show these numbers, but in the separate "unknown" row like the table image above, not like the one below (which does show unweighted n for missing, just squished together in the same cell as the other statistics)

Thank you again!

May 20 '22 05:05 lamhine

I think the easiest way to add this is to create a new variable that indicates if the variable of interest is missing or not. Place the new variable after the original variable it's associated with.

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0'

data(api, package = "survey")

svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) 
svy$variables <-
  svy$variables %>%
  mutate(avg.ed.is.na = is.na(avg.ed))

svy %>%
  tbl_svysummary(
    by = "both", 
    include = c(api00, stype, avg.ed, avg.ed.is.na),
    statistic = avg.ed.is.na ~ "{n_unweighted}",
    label = avg.ed.is.na ~ "Unknown (Unweighted)",
    digits = avg.ed.is.na ~ 0
  ) %>%
  modify_column_indent(columns = label, rows = variable == "avg.ed.is.na") %>%
  as_kable()

Characteristic	No, N = 1,692	Yes, N = 4,502
api00	631 (556, 710)	654 (551, 722)
stype
E	1,083 (64%)	3,791 (84%)
H	237 (14%)	237 (5.3%)
M	372 (22%)	474 (11%)
avg.ed	2.74 (2.35, 3.05)	2.60 (2.09, 3.02)
Unknown	237	643
Unknown (Unweighted)	7	19

^{Created on 2022-05-20 by the reprex package (v2.0.1)}

May 21 '22 01:05 ddsjoberg

Makes sense - this is a good workaround for now, thanks for providing a quick fix! I think it would still be a great feature at some point in the future though. Appreciate it!

On Fri, May 20, 2022 at 6:19 PM Daniel Sjoberg @.***> wrote:

I think the easiest way to add this is to create a new variable that indicates if the variable of interest is missing or not. Place the new variable after the original variable it's associated with.

library(gtsummary) packageVersion("gtsummary")#> [1] '1.6.0'

data(api, package = "survey") svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) svy$variables <- svy$variables %>% mutate(avg.ed.is.na = is.na(avg.ed)) svy %>% tbl_svysummary( by = "both", include = c(api00, stype, avg.ed, avg.ed.is.na), statistic = avg.ed.is.na ~ "{n_unweighted}", label = avg.ed.is.na ~ "Unknown (Unweighted)", digits = avg.ed.is.na ~ 0 ) %>% modify_column_indent(columns = label, rows = variable == "avg.ed.is.na") %>% as_kable()

Characteristic No, N = 1,692 Yes, N = 4,502 api00 631 (556, 710) 654 (551, 722) stype E 1,083 (64%) 3,791 (84%) H 237 (14%) 237 (5.3%) M 372 (22%) 474 (11%) avg.ed 2.74 (2.35, 3.05) 2.60 (2.09, 3.02) Unknown 237 643 Unknown (Unweighted) 7 19

Created on 2022-05-20 by the reprex package https://reprex.tidyverse.org (v2.0.1)

— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1251#issuecomment-1133492768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6E2WZ26C5LTWFTMRSZTFTVLA235ANCNFSM5WM6KTYQ . You are receiving this because you authored the thread.Message ID: @.***>

--

Tracy Lam-Hine, MBA (he/他) Doctoral Candidate, UC Berkeley School of Public Health xučyun (Huichin) Ohlone Land @.*** | read.cv/lamhine

May 21 '22 02:05 lamhine

Maybe an option like stats_unknown could be added to customize the way unknown are displayed?

Joseph Larmarange

Le sam. 21 mai 2022 à 04:05, Tracy Lam-Hine @.***> a écrit :

Makes sense - this is a good workaround for now, thanks for providing a quick fix! I think it would still be a great feature at some point in the future though. Appreciate it!

On Fri, May 20, 2022 at 6:19 PM Daniel Sjoberg @.***> wrote:

I think the easiest way to add this is to create a new variable that indicates if the variable of interest is missing or not. Place the new variable after the original variable it's associated with.

library(gtsummary) packageVersion("gtsummary")#> [1] '1.6.0'

data(api, package = "survey") svy <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) svy$variables <- svy$variables %>% mutate(avg.ed.is.na = is.na(avg.ed)) svy %>% tbl_svysummary( by = "both", include = c(api00, stype, avg.ed, avg.ed.is.na), statistic = avg.ed.is.na ~ "{n_unweighted}", label = avg.ed.is.na ~ "Unknown (Unweighted)", digits = avg.ed.is.na ~ 0 ) %>% modify_column_indent(columns = label, rows = variable == "avg.ed.is.na") %>% as_kable()

Characteristic No, N = 1,692 Yes, N = 4,502 api00 631 (556, 710) 654 (551, 722) stype E 1,083 (64%) 3,791 (84%) H 237 (14%) 237 (5.3%) M 372 (22%) 474 (11%) avg.ed 2.74 (2.35, 3.05) 2.60 (2.09, 3.02) Unknown 237 643 Unknown (Unweighted) 7 19

Created on 2022-05-20 by the reprex package < https://reprex.tidyverse.org> (v2.0.1)

— Reply to this email directly, view it on GitHub < https://github.com/ddsjoberg/gtsummary/issues/1251#issuecomment-1133492768 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB6E2WZ26C5LTWFTMRSZTFTVLA235ANCNFSM5WM6KTYQ

. You are receiving this because you authored the thread.Message ID: @.***>

--

Tracy Lam-Hine, MBA (he/他) Doctoral Candidate, UC Berkeley School of Public Health xučyun (Huichin) Ohlone Land @.*** | read.cv/lamhine

— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1251#issuecomment-1133507005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL5I7HURUA6W6B55SHU7TVLBAHNANCNFSM5WM6KTYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

May 21 '22 08:05 larmarange

I separate row is trickier. But if you are using the dev version of the package, you can add a theme to change the missing statistic. You can report both the weighted and unweighted counts in the same cell.

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.6.0.9009'

# set statistic for missing row
list("tbl_summary-str:missing_stat" =  ("Weighted: {N_miss}\nUnweighted: {N_miss_unweighted}")) %>%
  set_gtsummary_theme()

tbl <- 
  survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)  %>%
  tbl_svysummary(
    by = "both", 
    include = avg.ed,
  )

^{Created on 2022-05-21 by the reprex package (v2.0.1)}

May 21 '22 14:05 ddsjoberg

If missing_stat could be changed through a theme, should we propose a missing_stat argument in tbl_summary() and tbl_svysummary()? The option will be more visible in the doc. In addition, changing a theme could be difficult for some users.

May 23 '22 08:05 larmarange

^ if this were possible, it would be a very ideal solution! I do a lot of my work with secured PHI data on a VM that is disconnected from the internet, so installing a dev version of a package is usually a no-go (I bet others may be in similar situations as well)

May 23 '22 18:05 lamhine

Thanks for the input! @larmarange it's a good solution, but I need time to sit with it...you know how I hate adding additional arguments 😆

May 23 '22 18:05 ddsjoberg

UPDATE: I've been thinking and we can add the argument. It'll take a lot of updates (passing the argument to many many functions within functions). It's not the top of my list, but I'll to it eventually.

@larmarange @lamhine

Jun 07 '22 00:06 ddsjoberg

Thanks @ddsjoberg

Jun 07 '22 15:06 larmarange

Great to hear!

Jun 07 '22 16:06 lamhine

FYI, i had slated this update to occur for the v1.6.2 release, but I won't have the time to dedicate to it before the next release.

Sep 12 '22 18:09 ddsjoberg

gtsummary gtsummary copied to clipboard

Feature request: unweighted missing N in separate row

gtsummary
gtsummary copied to clipboard