purrr
purrr copied to clipboard
a tidy equivalent to `ave()` ?
ave()
is the base functional I use the most if I attach tidyverse, but it has several issues :
- Its name, and default, are awkward, I was a bit intimidated of this function as a beginner
- The doc is misleading, and only numeric input AND output are documented, so technically we "shouldn't" use it on characters and to return character for instance
- The
FUN
argument must always be spelt explicitly as it comes after the dots - We can't use purrr lambdas
- We can't pass additional arguments to dots, so many times we have to type a soul crushing (ok, maybe dramatizing a bit!)
FUN = function(x) mean(x, na.rm = TRUE)
, when we'd really rather typemean, na.rm = TRUE
-
ave()
can't output lists - Type stability (especially as
ave()
sometimes returns a "wrong type") - It could play well with vctrs
- peformance could be improved it seems, given that Hadley's remarks from 2010 have not been considered as far as I can see
It might have been considered but since I found no relevant issues here it is.
Here's rough implementation (not type stable and not very efficient), to get my point across.
map_along <- function(.x, .along, .f, ...) {
if (is.list(.along))
.along <- interaction(.along) # as used by base::split.default
.f <- rlang::as_function(.f)
body <- as.call(c(quote(.f), quote(.x), eval(substitute(alist(...)))))
fun <- rlang::new_function(alist(.x=), body)
ave(.x, .along, FUN = fun)
}
map_along(c(1:4,NA),c(1,1,2,2,2), mean)
#> [1] 1.5 1.5 NA NA NA
map_along(c(1:4,NA),c(1,1,2,2,2), mean, na.rm = TRUE)
#> [1] 1.5 1.5 3.5 3.5 3.5
map_along(c(1:4,NA),c(1,1,2,2,2), ~mean(., na.rm = TRUE))
#> [1] 1.5 1.5 3.5 3.5 3.5
Created on 2020-01-13 by the reprex package (v0.3.0)
A neat additional feature would be to have a magical function or pronoun to access the relevant instance of .along
in the loop, ave
doesn't support this and I missed it a few times (helps with debugging too).
probably something like this pattern (from experience with slide)
library(purrr)
library(vctrs)
attach(warpbreaks)
grouped_map_vec_impl <- function(.x, .g, .f, ..., .ptype, .constrain) {
.f <- as_mapper(.f, ...)
x_size <- vec_size(.x)
if (x_size != vec_size(.g)) {
stop("The size of `.x` and `.g` must be the same")
}
out <- vec_init(.ptype, n = x_size)
info <- vec_group_loc(.g)
locs <- info$loc
slices <- vec_chop(.x, locs)
n_slices <- vec_size(slices)
for (i in seq_len(n_slices)) {
loc <- locs[[i]]
slice <- slices[[i]]
res <- .f(slice, ...)
if (.constrain) {
if (vec_size(res) != 1L) {
stop("All elements must be size 1")
}
} else {
res <- list(res)
}
vec_slice(out, loc) <- res
}
out
}
# like map()
grouped_map <- function(.x, .g, .f, ...) {
grouped_map_vec_impl(.x, .g, .f, ..., .ptype = list(), .constrain = FALSE)
}
grouped_map_vec_simplify <- function(.x, .g, .f, ...) {
out <- grouped_map(.x, .g, .f, ...)
# check all size 1
if (!all(map_int(out, vec_size) == 1L)) {
stop("All elements must be size 1")
}
vec_c(!!! out)
}
# like map_vec()
# - base for map_dbl() and friends
grouped_map_vec <- function(.x, .g, .f, ..., .ptype = NULL) {
if (is.null(.ptype)) {
out <- grouped_map_vec_simplify(.x, .g, .f, ...)
return(out)
}
grouped_map_vec_impl(.x, .g, .f, ..., .ptype = .ptype, .constrain = TRUE)
}
grouped_map_dbl <- function(.x, .g, .f, ...) {
grouped_map_vec(.x, .g, .f, ..., .ptype = double())
}
# like ave()
identical(
grouped_map_dbl(breaks, wool, ~mean(.x)),
ave(breaks, wool, FUN = mean)
)
#> [1] TRUE
# use a data frame for multiple groups
# this frees up the `...` for use with `.f`
identical(
grouped_map_dbl(breaks, data.frame(wool = wool, tension = tension), mean, na.rm = TRUE),
ave(breaks, wool, tension, FUN = mean)
)
#> [1] TRUE
# safe auto simplify with .ptype = NULL
grouped_map_vec(breaks, wool, mean)
#> [1] 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704
#> [9] 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704
#> [17] 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704
#> [25] 31.03704 31.03704 31.03704 25.25926 25.25926 25.25926 25.25926 25.25926
#> [33] 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926
#> [41] 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926
#> [49] 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926
grouped_map_vec(breaks, wool, class)
#> [1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [29] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [36] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [50] "numeric" "numeric" "numeric" "numeric" "numeric"
# allows non-size 1 results for each call to `.f`
grouped_map(breaks, wool, ~c(mean(.x), median(.x)))[25:30]
#> [[1]]
#> [1] 31.03704 26.00000
#>
#> [[2]]
#> [1] 31.03704 26.00000
#>
#> [[3]]
#> [1] 31.03704 26.00000
#>
#> [[4]]
#> [1] 25.25926 24.00000
#>
#> [[5]]
#> [1] 25.25926 24.00000
#>
#> [[6]]
#> [1] 25.25926 24.00000
# does not allow non-size 1 results
grouped_map_vec(breaks, wool, ~c(mean(.x), median(.x)), .ptype = double())
#> Error in grouped_map_vec_impl(.x, .g, .f, ..., .ptype = .ptype, .constrain = TRUE): All elements must be size 1
grouped_map_vec(breaks, wool, ~c(mean(.x), median(.x)), .ptype = NULL)
#> Error in grouped_map_vec_simplify(.x, .g, .f, ...): All elements must be size 1
grouped_map(1, 1:2, mean)
#> Error in grouped_map_vec_impl(.x, .g, .f, ..., .ptype = list(), .constrain = FALSE): The size of `.x` and `.g` must be the same
Different from https://github.com/r-lib/funs/issues/42#issue-530151360 because the output size here is vec_size(.x)
and the output size there is vec_size(.g)
Fantastic! It would go well with a tapply equivalent, which might be closer to hadley's function you linked above.
@moodymudskipper FYI this is an incorrect pattern:
body <- as.call(c(quote(.f), quote(.x), eval(substitute(alist(...)))))
- It evaluates dots in the wrong environment.
- It evaluates dots repeatedly on repeated invokations.
In general there is no correct way to quote dots expressions than with enquos()
. But I would only make quosures when data masking is needed, and the quoting causes issue (2) to happen.
Here is how to solve these issues:
.f <- rlang::as_function(.f)
fn <- function(x) .f(x, ...)
The ...
are looked up lexically using normal rules of evaluation and thus will only be evaluated once (results are stored in the promises).
Thanks Lionel! I greatly appreciate you taking time to educate me on this. I never realised ...
were looked up lexically like other variables.
This just doesn't feel like a purrr function to me — it feels more something that belongs in vctrs or funs?