gtsummary icon indicating copy to clipboard operation
gtsummary copied to clipboard

Feature request: tbl_svysummary reports p.std.error in percentage instead of proportion

Open szimmer opened this issue 2 years ago • 3 comments

Currently, tbl_svysummary() allows several statistics output for categorical variables. Specifically, the following 2:

  • p: percentage
  • p.std.error: standard error of the sample proportion computed with [survey::svymean()]

For tables, it makes more sense for these to be on the same scale and I think p.std.error would make more sense to be the "standard error of the sample percentage"

I tried multiplying by 100 in glue but this option does not work.

library(gtsummary)

tbl_svysummary_ex1 <-
  survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
  tbl_svysummary(include = c(Class),
                 statistic=list(all_categorical()~"{p} ({p.std.error})"))

tbl_svysummary_ex2 <-
  survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
  tbl_svysummary(include = c(Class),
                 statistic=list(all_categorical()~"{p} ({p.std.error*100})"))
#> Error in `mutate()`:
#> ℹ In argument: `tbl_stats = pmap(...)`.
#> Caused by error in `pmap()`:
#> ℹ In index: 1.
#> Caused by error in `value[[3L]]()`:
#> ! There was an error assembling the summary statistics for 'Class'
#>   with summary type 'categorical'.
#> 
#> There are 2 common sources for this error.
#> 1. You have requested summary statistics meant for continuous
#>    variables for a variable being as summarized as categorical.
#>    To change the summary type to continuous, add the argument
#>   `type = list(Class ~ 'continuous')`
#> 2. One of the functions or statistics from the `statistic=` argument is not valid.
#> Backtrace:
#>      ▆
#>   1. ├─... %>% ...
#>   2. ├─gtsummary::tbl_svysummary(...)
#>   3. │ └─... %>% ...
#>   4. ├─dplyr::select(., "variable", "var_type", "var_label", everything())
#>   5. ├─tidyr::unnest(., "tbl_stats")
#>   6. ├─dplyr::select(., var_type = "summary_type", "var_label", "tbl_stats")
#>   7. ├─dplyr::mutate(...)
#>   8. ├─dplyr:::mutate.data.frame(...)
#>   9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
#>  10. │   ├─base::withCallingHandlers(...)
#>  11. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
#>  12. │     └─mask$eval_all_mutate(quo)
#>  13. │       └─dplyr (local) eval()
#>  14. ├─purrr::pmap(...)
#>  15. │ └─purrr:::pmap_("list", .l, .f, ..., .progress = .progress)
#>  16. │   ├─purrr:::with_indexed_errors(...)
#>  17. │   │ └─base::withCallingHandlers(...)
#>  18. │   ├─purrr:::call_with_cleanup(...)
#>  19. │   └─gtsummary (local) .f(...)
#>  20. │     └─gtsummary:::df_stats_to_tbl(...)
#>  21. │       └─base::tryCatch(...)
#>  22. │         └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>  23. │           └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>  24. │             └─value[[3L]](cond)
#>  25. │               └─base::stop(...)
#>  26. └─base::.handleSimpleError(...)
#>  27.   └─purrr (local) h(simpleError(msg, call))
#>  28.     └─cli::cli_abort(...)
#>  29.       └─rlang::abort(...)

Created on 2023-07-16 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] gtsummary_1.7.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.5-4.1       dplyr_1.1.2          compiler_4.3.1      
#>  [4] reprex_2.0.2         tidyselect_1.2.0     xml2_1.3.4          
#>  [7] stringr_1.5.0        survey_4.2-1         tidyr_1.3.0         
#> [10] splines_4.3.1        broom.helpers_1.13.0 yaml_2.3.7          
#> [13] fastmap_1.1.1        lattice_0.21-8       R6_2.5.1            
#> [16] generics_0.1.3       knitr_1.42           forcats_1.0.0       
#> [19] tibble_3.2.1         DBI_1.1.3            R.cache_0.16.0      
#> [22] pillar_1.9.0         R.utils_2.12.2       rlang_1.1.1         
#> [25] utf8_1.2.3           stringi_1.7.12       xfun_0.39           
#> [28] fs_1.6.2             cli_3.6.1            withr_2.5.0         
#> [31] magrittr_2.0.3       grid_4.3.1           digest_0.6.31       
#> [34] rstudioapi_0.14      lifecycle_1.0.3      R.methodsS3_1.8.2   
#> [37] R.oo_1.25.0          vctrs_0.6.2          evaluate_0.21       
#> [40] glue_1.6.2           styler_1.10.1        mitools_2.4         
#> [43] survival_3.5-5       gt_0.9.0             fansi_1.0.4         
#> [46] rmarkdown_2.21       purrr_1.0.1          tools_4.3.1         
#> [49] pkgconfig_2.0.3      htmltools_0.5.5

szimmer avatar Jul 16 '23 16:07 szimmer

Can you linked to published examples using this suggestion? Thanks

On Sun, Jul 16, 2023, 9:37 AM Stephanie Zimmer @.***> wrote:

Currently, tbl_svysummary() allows several statistics output for categorical variables. Specifically, the following 2:

  • p: percentage
  • p.std.error: standard error of the sample proportion computed with [survey::svymean()]

For tables, it makes more sense for these to be on the same scale and I think p.std.error would make more sense to be the "standard error of the sample percentage"

I tried multiplying by 100 in glue but this option does not work.

library(gtsummary) tbl_svysummary_ex1 <- survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>% tbl_svysummary(include = c(Class), statistic=list(all_categorical()~"{p} ({p.std.error})")) tbl_svysummary_ex2 <- survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>% tbl_svysummary(include = c(Class), statistic=list(all_categorical()~"{p} ({p.std.error*100})"))#> Error in mutate():#> ℹ In argument: tbl_stats = pmap(...).#> Caused by error in pmap():#> ℹ In index: 1.#> Caused by error in value[[3L]]():#> ! There was an error assembling the summary statistics for 'Class'#> with summary type 'categorical'.#> #> There are 2 common sources for this error.#> 1. You have requested summary statistics meant for continuous#> variables for a variable being as summarized as categorical.#> To change the summary type to continuous, add the argument#> type = list(Class ~ 'continuous')#> 2. One of the functions or statistics from the statistic= argument is not valid.#> Backtrace:#> ▆#> 1. ├─... %>% ...#> 2. ├─gtsummary::tbl_svysummary(...)#> 3. │ └─... %>% ...#> 4. ├─dplyr::select(., "variable", "var_type", "var_label", everything())#> 5. ├─tidyr::unnest(., "tbl_stats")#> 6. ├─dplyr::select(., var_type = "summary_type", "var_label", "tbl_stats")#> 7. ├─dplyr::mutate(...)#> 8. ├─dplyr:::mutate.data.frame(...)#> 9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)#> 10. │ ├─base::withCallingHandlers(...)#> 11. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)#> 12. │ └─mask$eval_all_mutate(quo)#> 13. │ └─dplyr (local) eval()#> 14. ├─purrr::pmap(...)#> 15. │ └─purrr:::pmap_("list", .l, .f, ..., .progress = .progress)#> 16. │ ├─purrr:::with_indexed_errors(...)#> 17. │ │ └─base::withCallingHandlers(...)#> 18. │ ├─purrr:::call_with_cleanup(...)#> 19. │ └─gtsummary (local) .f(...)#> 20. │ └─gtsummary:::df_stats_to_tbl(...)#> 21. │ └─base::tryCatch(...)#> 22. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)#> 23. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])#> 24. │ └─value[3L]#> 25. │ └─base::stop(...)#> 26. └─base::.handleSimpleError(...)#> 27. └─purrr (local) h(simpleError(msg, call))#> 28. └─cli::cli_abort(...)#> 29. └─rlang::abort(...)

Created on 2023-07-16 with reprex v2.0.2 https://reprex.tidyverse.org Session info

sessionInfo()#> R version 4.3.1 (2023-06-16 ucrt)#> Platform: x86_64-w64-mingw32/x64 (64-bit)#> Running under: Windows 11 x64 (build 22621)#> #> Matrix products: default#> #> #> locale:#> [1] LC_COLLATE=English_United States.utf8 #> [2] LC_CTYPE=English_United States.utf8 #> [3] LC_MONETARY=English_United States.utf8#> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United States.utf8 #> #> time zone: America/New_York#> tzcode source: internal#> #> attached base packages:#> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages:#> [1] gtsummary_1.7.2#> #> loaded via a namespace (and not attached):#> [1] Matrix_1.5-4.1 dplyr_1.1.2 compiler_4.3.1 #> [4] reprex_2.0.2 tidyselect_1.2.0 xml2_1.3.4 #> [7] stringr_1.5.0 survey_4.2-1 tidyr_1.3.0 #> [10] splines_4.3.1 broom.helpers_1.13.0 yaml_2.3.7 #> [13] fastmap_1.1.1 lattice_0.21-8 R6_2.5.1 #> [16] generics_0.1.3 knitr_1.42 forcats_1.0.0 #> [19] tibble_3.2.1 DBI_1.1.3 R.cache_0.16.0 #> [22] pillar_1.9.0 R.utils_2.12.2 rlang_1.1.1 #> [25] utf8_1.2.3 stringi_1.7.12 xfun_0.39 #> [28] fs_1.6.2 cli_3.6.1 withr_2.5.0 #> [31] magrittr_2.0.3 grid_4.3.1 digest_0.6.31 #> [34] rstudioapi_0.14 lifecycle_1.0.3 R.methodsS3_1.8.2 #> [37] R.oo_1.25.0 vctrs_0.6.2 evaluate_0.21 #> [40] glue_1.6.2 styler_1.10.1 mitools_2.4 #> [43] survival_3.5-5 gt_0.9.0 fansi_1.0.4 #> [46] rmarkdown_2.21 purrr_1.0.1 tools_4.3.1 #> [49] pkgconfig_2.0.3 htmltools_0.5.5

— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1536, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGMIZHHHU5TZUVTR76JTMNLXQQKD3ANCNFSM6AAAAAA2MBHYKI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ddsjoberg avatar Jul 16 '23 16:07 ddsjoberg

Both are proportions. Maybe we should update the documentation to avoid any confusion.

By default, p is styled with style_percent.

When you customize displayed stats, you should also update digits with the appropriate formatter.

Regards

Le dim. 16 juil. 2023 à 19:45, Daniel Sjoberg @.***> a écrit :

Can you linked to published examples using this suggestion? Thanks

On Sun, Jul 16, 2023, 9:37 AM Stephanie Zimmer @.***> wrote:

Currently, tbl_svysummary() allows several statistics output for categorical variables. Specifically, the following 2:

  • p: percentage
  • p.std.error: standard error of the sample proportion computed with [survey::svymean()]

For tables, it makes more sense for these to be on the same scale and I think p.std.error would make more sense to be the "standard error of the sample percentage"

I tried multiplying by 100 in glue but this option does not work.

library(gtsummary) tbl_svysummary_ex1 <- survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>% tbl_svysummary(include = c(Class), statistic=list(all_categorical()~"{p} ({p.std.error})")) tbl_svysummary_ex2 <- survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>% tbl_svysummary(include = c(Class), statistic=list(all_categorical()~"{p} ({p.std.error*100})"))#> Error in mutate():#> ℹ In argument: tbl_stats = pmap(...).#> Caused by error in pmap():#> ℹ In index: 1.#> Caused by error in value[[3L]]():#> ! There was an error assembling the summary statistics for 'Class'#> with summary type 'categorical'.#> #> There are 2 common sources for this error.#> 1. You have requested summary statistics meant for continuous#> variables for a variable being as summarized as categorical.#> To change the summary type to continuous, add the argument#> type = list(Class ~ 'continuous')#> 2. One of the functions or statistics from the statistic= argument is not valid.#> Backtrace:#> ▆#> 1. ├─... %>% ...#> 2. ├─gtsummary::tbl_svysummary(...)#> 3. │ └─... %>% ...#> 4. ├─dplyr::select(., "variable", "var_type", "var_label", everything())#> 5. ├─tidyr::unnest(., "tbl_stats")#> 6. ├─dplyr::select(., var_type = "summary_type", "var_label", "tbl_stats")#> 7. ├─dplyr::mutate(...)#> 8. ├─dplyr:::mutate.data.frame(...)#> 9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)#> 10. │ ├─base::withCallingHandlers(...)#> 11. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)#> 12. │ └─mask$eval_all_mutate(quo)#> 13. │ └─dplyr (local) eval()#> 14. ├─purrr::pmap(...)#> 15. │ └─purrr:::pmap_("list", .l, .f, ..., .progress = .progress)#> 16. │ ├─purrr:::with_indexed_errors(...)#> 17. │ │ └─base::withCallingHandlers(...)#> 18. │ ├─purrr:::call_with_cleanup(...)#>

  1. │ └─gtsummary (local) .f(...)#> 20. │ └─gtsummary:::df_stats_to_tbl(...)#> 21. │ └─base::tryCatch(...)#> 22. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)#> 23. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])#> 24. │ └─value[3L]#> 25. │ └─base::stop(...)#> 26. └─base::.handleSimpleError(...)#> 27. └─purrr (local) h(simpleError(msg, call))#> 28. └─cli::cli_abort(...)#> 29. └─rlang::abort(...)

Created on 2023-07-16 with reprex v2.0.2 https://reprex.tidyverse.org Session info

sessionInfo()#> R version 4.3.1 (2023-06-16 ucrt)#> Platform: x86_64-w64-mingw32/x64 (64-bit)#> Running under: Windows 11 x64 (build 22621)#> #> Matrix products: default#> #> #> locale:#> [1] LC_COLLATE=English_United States.utf8 #> [2] LC_CTYPE=English_United States.utf8 #> [3] LC_MONETARY=English_United States.utf8#> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United States.utf8 #> #> time zone: America/New_York#> tzcode source: internal#> #> attached base packages:#> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages:#> [1] gtsummary_1.7.2#> #> loaded via a namespace (and not attached):#> [1] Matrix_1.5-4.1 dplyr_1.1.2 compiler_4.3.1 #> [4] reprex_2.0.2 tidyselect_1.2.0 xml2_1.3.4 #> [7] stringr_1.5.0 survey_4.2-1 tidyr_1.3.0 #> [10] splines_4.3.1 broom.helpers_1.13.0 yaml_2.3.7 #> [13] fastmap_1.1.1 lattice_0.21-8 R6_2.5.1 #> [16] generics_0.1.3 knitr_1.42 forcats_1.0.0 #> [19] tibble_3.2.1 DBI_1.1.3 R.cache_0.16.0 #> [22] pillar_1.9.0 R.utils_2.12.2 rlang_1.1.1 #> [25] utf8_1.2.3 stringi_1.7.12 xfun_0.39 #> [28] fs_1.6.2 cli_3.6.1 withr_2.5.0 #> [31] magrittr_2.0.3 grid_4.3.1 digest_0.6.31 #> [34] rstudioapi_0.14 lifecycle_1.0.3 R.methodsS3_1.8.2 #> [37] R.oo_1.25.0 vctrs_0.6.2 evaluate_0.21 #> [40] glue_1.6.2 styler_1.10.1 mitools_2.4 #> [43] survival_3.5-5 gt_0.9.0 fansi_1.0.4 #> [46] rmarkdown_2.21 purrr_1.0.1 tools_4.3.1 #> [49] pkgconfig_2.0.3 htmltools_0.5.5

— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1536, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AGMIZHHHU5TZUVTR76JTMNLXQQKD3ANCNFSM6AAAAAA2MBHYKI>

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/ddsjoberg/gtsummary/issues/1536#issuecomment-1637135952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHL5IZGPL2KG3WWZHIBBVLXQQLEJANCNFSM6AAAAAA2MBHYKI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Joseph Larmarange

larmarange avatar Jul 16 '23 21:07 larmarange

Hi @szimmer ,

Re-reading this, and I think I mis-read the first time. If you'd like to change the formatting for the percent standard error, you can use the digits argument to change the rounding. Example below!

library(gtsummary)

survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
  tbl_svysummary(
    include = c(Class),
    statistic=list(all_categorical()~"{p} ({p.std.error})"),
    digits = all_categorical() ~ list(0, \(x) style_number(x, scale = 100, digits = 2))
  ) |> 
  as_kable() # convert to kable to display on GH
Characteristic N = 2,201
Class
1st 15 (9.43)
2nd 13 (8.63)
3rd 32 (17.01)
Crew 40 (21.27)

Created on 2023-10-08 with reprex v2.0.2

ddsjoberg avatar Oct 08 '23 19:10 ddsjoberg

Hi @szimmer @larmarange ,

I agree with @larmarange that a bit more document would be the way to go. When we start scaling variances and standard errors, we need to be more careful, and I would prefer to leave that to the user. This is how they are currently documented. image

If you'd like, please submit a pull request with the proposed up. Thanks!

ddsjoberg avatar Jun 29 '24 16:06 ddsjoberg