Add new functions `fold` and `fold_over`
Based on this gist fold and fold_over might be useful add on functions for a future version of dplyover. There should be a better name than fold for this kind of functions.
likert_col <- function(n = 10) {
sample(7, size = 10, replace = TRUE)
}
# toy data
dat <- tibble(
cat_1 = likert_col(),
cat_2 = likert_col(),
cat_3 = likert_col(),
dog_1 = likert_col(),
dog_2 = likert_col()
)
# `fold` does not exist yet
dat %>%
transmute(fold(starts_with("cat"),
list(sum = ~ rowSums(.x),
mean = ~ rowMeans(.x))))
# A tibble: 10 x 2
cat_sum cat_mean
<dbl> <dbl>
1 11 3.67
2 10 3.33
3 6 2
4 4 1.33
5 10 3.33
6 7 2.33
7 12 4
8 12 4
9 17 5.67
10 13 4.33
# `fold_over` does not exist yet
dat %>%
transmute(fold_over(cut_names("_[0-9]*$"),
~ starts_with(.x),
~ rowSums(.x)))
# A tibble: 10 x 2
cat dog
<dbl> <dbl>
1 11 11
2 10 10
3 6 6
4 4 4
5 10 10
6 7 7
7 12 12
8 12 12
9 17 17
10 13 13
I think fold would be a great extension of {dplyover}, but a better name should be found given that {rsample} uses vfold and {furrr} has also a fold function.
Then again, fold does pretty much what it says. It folds down several columns of a data.frame to one column, for example by calculating the rowMean.
Firstly, thanks for the package. I think this has a far more common use case than Hadley suggested.
Secondly, maybe I'm misunderstanding the purpose of fold here, but wouldn't
summarise(over(starts_with("cat"),
list(sum = ~ rowSums(.x),
mean = ~ rowMeans(.x))))
do the same thing? At least that way you avoid using the name "fold".
Thank you for your feedback! Unfortunately over and the other functions in the over-across function family don't work like that. over loops over a vector and creates a new column for each element. Apart from that over does not support tidy-select syntax in its .x argument.
However, we could create a named list of data.frames on the fly as input to over and then produce a similar outcome. Having a dedicated function like fold and fold_over would still be helpful I guess, since we wouldn't need to use one or several select calls as input to over.
# instead of fold_over we could do:
dat %>%
summarise(over(list(cat = select(., starts_with("cat")),
dog = select(., starts_with("dog"))),
list(sum = rowSums,
mean = rowMeans)))
#> # A tibble: 10 x 4
#> cat_sum cat_mean dog_sum dog_mean
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12 4 12 6
#> 2 11 3.67 3 1.5
#> 3 19 6.33 4 2
#> 4 6 2 9 4.5
#> 5 9 3 14 7
#> 6 4 1.33 7 3.5
#> 7 7 2.33 10 5
#> 8 8 2.67 3 1.5
#> 9 9 3 9 4.5
#> 10 10 3.33 7 3.5
Created on 2021-08-19 by the reprex package (v0.3.0)