summarytools icon indicating copy to clipboard operation
summarytools copied to clipboard

Change of results when using tb() in grouped freq()

Open Crismoc opened this issue 2 years ago • 3 comments

After getting results from a grouped freq(), I would like to put them in an object with tibble or data.frame format. When using tb() the results are transformed in what might be unintended behavior:

library(summarytools)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tobacco |> 
  group_by(smoker) |> 
  freq(diseased)
#> Frequencies  
#> diseased  
#> Type: Factor  
#> Group: smoker = Yes  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes    125     41.95          41.95     41.95          41.95
#>          No    173     58.05         100.00     58.05         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    298    100.00         100.00    100.00         100.00
#> 
#> Group: smoker = No  
#> 
#>               Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
#> ----------- ------ --------- -------------- --------- --------------
#>         Yes     99     14.10          14.10     14.10          14.10
#>          No    603     85.90         100.00     85.90         100.00
#>        <NA>      0                               0.00         100.00
#>       Total    702    100.00         100.00    100.00         100.00

tobacco |> 
  group_by(smoker) |> 
  freq(diseased) |> 
  tb(na.rm = T)
#> # A tibble: 4 × 5
#>   smoker diseased  freq   pct pct_cum
#>   <fct>  <fct>    <dbl> <dbl>   <dbl>
#> 1 Yes    Yes        125 21.0     21.0
#> 2 Yes    No         173 29.0     50  
#> 3 No     Yes         99  7.05    57.1
#> 4 No     No         603 42.9    100

Created on 2023-04-19 with reprex v2.0.2

Is there a way to transform the same results to a tibble or data.frame?

Crismoc avatar Apr 18 '23 23:04 Crismoc

Could you pls show what would be the desired resulting df?

dcomtois avatar Aug 20 '23 07:08 dcomtois

I would expect to get something like this:

library(summarytools)
library(dplyr)

tobacco |> 
  group_by(smoker) |> 
  reframe(
    level = names(table(diseased)),
    Freq = table(diseased),
    `% Valid` = prop.table(table(diseased)))
#> # A tibble: 4 × 4
#>   smoker level Freq        `% Valid`  
#>   <fct>  <chr> <table[1d]> <table[1d]>
#> 1 Yes    Yes   125         0.4194631  
#> 2 Yes    No    173         0.5805369  
#> 3 No     Yes    99         0.1410256  
#> 4 No     No    603         0.8589744

Crismoc avatar Aug 20 '23 16:08 Crismoc

I see what you mean. The proportions are recalculated to take into account both groups, and it can create confusion. Aside from better documenting this, I think an additional parameter is in order. That way the user can decide whether to recalculate proportions or not. Thank you for pointing it out.

dcomtois avatar Nov 10 '23 05:11 dcomtois