purrr a tidy equivalent to `ave()` ?

ave() is the base functional I use the most if I attach tidyverse, but it has several issues :

Its name, and default, are awkward, I was a bit intimidated of this function as a beginner
The doc is misleading, and only numeric input AND output are documented, so technically we "shouldn't" use it on characters and to return character for instance
The FUN argument must always be spelt explicitly as it comes after the dots
We can't use purrr lambdas
We can't pass additional arguments to dots, so many times we have to type a soul crushing (ok, maybe dramatizing a bit!) FUN = function(x) mean(x, na.rm = TRUE), when we'd really rather type mean, na.rm = TRUE
ave() can't output lists
Type stability (especially as ave() sometimes returns a "wrong type")
It could play well with vctrs
peformance could be improved it seems, given that Hadley's remarks from 2010 have not been considered as far as I can see

It might have been considered but since I found no relevant issues here it is.

Here's rough implementation (not type stable and not very efficient), to get my point across.

map_along <- function(.x, .along, .f, ...) {
  if (is.list(.along))
    .along <- interaction(.along) # as used by base::split.default
  .f <- rlang::as_function(.f)
  body <- as.call(c(quote(.f), quote(.x), eval(substitute(alist(...)))))
  fun <- rlang::new_function(alist(.x=), body)
  ave(.x, .along, FUN = fun)
}
map_along(c(1:4,NA),c(1,1,2,2,2), mean)
#> [1] 1.5 1.5  NA  NA  NA
map_along(c(1:4,NA),c(1,1,2,2,2), mean, na.rm = TRUE)
#> [1] 1.5 1.5 3.5 3.5 3.5
map_along(c(1:4,NA),c(1,1,2,2,2), ~mean(., na.rm = TRUE))
#> [1] 1.5 1.5 3.5 3.5 3.5

^{Created on 2020-01-13 by the reprex package (v0.3.0)}

A neat additional feature would be to have a magical function or pronoun to access the relevant instance of .along in the loop, ave doesn't support this and I missed it a few times (helps with debugging too).

Jan 13 '20 11:01 moodymudskipper

probably something like this pattern (from experience with slide)

library(purrr)
library(vctrs)

attach(warpbreaks)

grouped_map_vec_impl <- function(.x, .g, .f, ..., .ptype, .constrain) {
  .f <- as_mapper(.f, ...)
  
  x_size <- vec_size(.x)
  
  if (x_size != vec_size(.g)) {
    stop("The size of `.x` and `.g` must be the same")
  }
  
  out <- vec_init(.ptype, n = x_size)
  
  info <- vec_group_loc(.g)
  locs <- info$loc
  
  slices <- vec_chop(.x, locs)
  n_slices <- vec_size(slices)
  
  for (i in seq_len(n_slices)) {
    loc <- locs[[i]]
    slice <- slices[[i]]
    
    res <- .f(slice, ...)
    
    if (.constrain) {
      if (vec_size(res) != 1L) {
        stop("All elements must be size 1")
      }
    } else {
      res <- list(res)
    }
    
    vec_slice(out, loc) <- res
  }
  
  out
}

# like map()
grouped_map <- function(.x, .g, .f, ...) {
  grouped_map_vec_impl(.x, .g, .f, ..., .ptype = list(), .constrain = FALSE)
}

grouped_map_vec_simplify <- function(.x, .g, .f, ...) {
  out <- grouped_map(.x, .g, .f, ...)
  
  # check all size 1
  if (!all(map_int(out, vec_size) == 1L)) {
    stop("All elements must be size 1")
  }
  
  vec_c(!!! out)
}

# like map_vec()
# - base for map_dbl() and friends
grouped_map_vec <- function(.x, .g, .f, ..., .ptype = NULL) {
  if (is.null(.ptype)) {
    out <- grouped_map_vec_simplify(.x, .g, .f, ...)
    return(out)
  }
  
  grouped_map_vec_impl(.x, .g, .f, ..., .ptype = .ptype, .constrain = TRUE)
}

grouped_map_dbl <- function(.x, .g, .f, ...) {
  grouped_map_vec(.x, .g, .f, ..., .ptype = double())
}

# like ave()
identical(
  grouped_map_dbl(breaks, wool, ~mean(.x)),
  ave(breaks, wool, FUN = mean)
)
#> [1] TRUE

# use a data frame for multiple groups
# this frees up the `...` for use with `.f`
identical(
  grouped_map_dbl(breaks, data.frame(wool = wool, tension = tension), mean, na.rm = TRUE),
  ave(breaks, wool, tension, FUN = mean)
)
#> [1] TRUE


# safe auto simplify with .ptype = NULL
grouped_map_vec(breaks, wool, mean)
#>  [1] 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704
#>  [9] 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704
#> [17] 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704 31.03704
#> [25] 31.03704 31.03704 31.03704 25.25926 25.25926 25.25926 25.25926 25.25926
#> [33] 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926
#> [41] 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926
#> [49] 25.25926 25.25926 25.25926 25.25926 25.25926 25.25926

grouped_map_vec(breaks, wool, class)
#>  [1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [29] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [36] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [50] "numeric" "numeric" "numeric" "numeric" "numeric"


# allows non-size 1 results for each call to `.f`
grouped_map(breaks, wool, ~c(mean(.x), median(.x)))[25:30]
#> [[1]]
#> [1] 31.03704 26.00000
#> 
#> [[2]]
#> [1] 31.03704 26.00000
#> 
#> [[3]]
#> [1] 31.03704 26.00000
#> 
#> [[4]]
#> [1] 25.25926 24.00000
#> 
#> [[5]]
#> [1] 25.25926 24.00000
#> 
#> [[6]]
#> [1] 25.25926 24.00000

# does not allow non-size 1 results
grouped_map_vec(breaks, wool, ~c(mean(.x), median(.x)), .ptype = double())
#> Error in grouped_map_vec_impl(.x, .g, .f, ..., .ptype = .ptype, .constrain = TRUE): All elements must be size 1

grouped_map_vec(breaks, wool, ~c(mean(.x), median(.x)), .ptype = NULL)
#> Error in grouped_map_vec_simplify(.x, .g, .f, ...): All elements must be size 1

grouped_map(1, 1:2, mean)
#> Error in grouped_map_vec_impl(.x, .g, .f, ..., .ptype = list(), .constrain = FALSE): The size of `.x` and `.g` must be the same

Jan 20 '20 17:01 DavisVaughan

Different from https://github.com/r-lib/funs/issues/42#issue-530151360 because the output size here is vec_size(.x) and the output size there is vec_size(.g)

Jan 20 '20 17:01 DavisVaughan

Fantastic! It would go well with a tapply equivalent, which might be closer to hadley's function you linked above.

Jan 20 '20 18:01 moodymudskipper

@moodymudskipper FYI this is an incorrect pattern:

body <- as.call(c(quote(.f), quote(.x), eval(substitute(alist(...)))))

It evaluates dots in the wrong environment.
It evaluates dots repeatedly on repeated invokations.

In general there is no correct way to quote dots expressions than with enquos(). But I would only make quosures when data masking is needed, and the quoting causes issue (2) to happen.

Here is how to solve these issues:

.f <- rlang::as_function(.f)
fn <- function(x) .f(x, ...)

The ... are looked up lexically using normal rules of evaluation and thus will only be evaluated once (results are stored in the promises).

Aug 05 '20 11:08 lionel-

Thanks Lionel! I greatly appreciate you taking time to educate me on this. I never realised ... were looked up lexically like other variables.

Aug 05 '20 12:08 moodymudskipper

This just doesn't feel like a purrr function to me — it feels more something that belongs in vctrs or funs?

Aug 24 '22 14:08 hadley

purrr purrr copied to clipboard

a tidy equivalent to `ave()` ?

purrr
purrr copied to clipboard