naniar copied to clipboard
miss_var_summary returns the wrong percentage
For example:
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> filter, lag
#> The following objects are masked from 'package:base':
#> intersect, setdiff, setequal, union
airquality %>%
group_by(Month) %>%
#> # A tibble: 25 x 4
#> # Groups: Month [5]
#> Month variable n_miss pct_miss
#> <int> <chr> <int> <dbl>
#> 1 5 Ozone 5 16.1
#> 2 5 Solar.R 4 12.9
#> 3 5 Wind 0 0
#> 4 5 Temp 0 0
#> 5 5 Day 0 0
#> 6 6 Ozone 21 70
#> 7 6 Solar.R 0 0
#> 8 6 Wind 0 0
#> 9 6 Temp 0 0
#> 10 6 Day 0 0
#> # … with 15 more rows
Created on 2020-05-13 by the reprex package (v0.3.0)
It should instead be:
# A tibble: 25 x 4
Month variables n_miss pct_miss
<int> <chr> <int> <dbl>
1 5 Ozone 5 3.27
2 5 Solar.R 4 2.61
3 5 Wind 0 0
4 5 Temp 0 0
5 5 Day 0 0
6 6 Ozone 21 13.7
7 6 Solar.R 0 0
8 6 Wind 0 0
9 6 Temp 0 0
10 6 Day 0 0
The problem comes from pct_miss
showing the percentage relative to the number of rows per group...not sure if this is a problem.
I think that this is fine