vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

Implement vec_arg_min and vec_arg_max

Open hadley opened this issue 7 years ago • 6 comments

And use in min() and max() methods

hadley avatar Sep 19 '18 07:09 hadley

I now think it would be useful to have:

# error if size 0, requires size >0 to be able to return a single location back
vec_arg_min(x)
vec_arg_max(x)

# Returns a size 1 value that represents the maximum/minimum value for a particular ptype
# - generic
# - default would cast Inf/-Inf to the type of x
# - clock would override this to return the max/min possible date-time values
# - data frames would map(x, vec_ptype_maximum) over the columns to get a 1 row result
vec_ptype_maximum(x)
vec_ptype_minimum(x)

vec_min(x)
vec_max(x) 

# the above two are implemented as:
vec_min <- function(x) {
  if (vec_size(x)) {
    vec_slice(x, vec_arg_min(x))
  } else {
    vec_ptype_minimum(x)
  }
}

I'm currently at a place where I'm using min() and max() in a "generic" way and would need them to even be generic over data frames, so vec_min/max() would be useful.

I'm not sure what the vec_ptype_minimum() of a character vector is. min(character()) returns NA_character_ and is documented to do so, but even the docs admit that that is strange. Maybe that should be an error in our vec_ptype_minimum.character method.

DavisVaughan avatar Jan 22 '22 14:01 DavisVaughan

I also like that Python's min() function has a default argument https://docs.python.org/3/library/functions.html#min

So maybe its something like:

vec_min <- function(x, ..., na_rm = FALSE, empty = NULL) {
  check_dots_empty0(...)
  
  if (na_rm) {
    x <- vec_slice(x, !vec_equal_na(x))
  }
  
  if (is.null(empty)) {
    empty <- vec_ptype_maximum(x)
  } else {
    empty <- vec_cast(empty, x)
    vec_assert(empty, size = 1L)
  }
  
  if (vec_is_empty(x)) {
    empty
  } else {
    vec_slice(x, vec_arg_min(x))
  }
}

I think the empty argument would really come in handy with group_by() + summarize(). My wife just found herself in a situation where she wanted to compute the min date per group, but many of her dates were missing. She wanted to remove the missing dates and still compute the min on any data that was left, but she wanted to retain an NA if the group was entirely composed of NAs (rather than getting an Inf date).

suppressPackageStartupMessages({
  library(dplyr)
  library(vctrs)
})
#> Warning: package 'dplyr' was built under R version 4.1.2

df <- tibble(g = c(1, 1, 2), date = new_date(c(0, NA, NA)))

df_min <- df %>%
  group_by(g) %>%
  summarise(date = min(date, na.rm = TRUE))
#> Warning in min.default(structure(NA_real_, class = "Date"), na.rm = TRUE): no
#> non-missing arguments to min; returning Inf

# the print method lies!
df_min
#> # A tibble: 2 × 2
#>       g date      
#>   <dbl> <date>    
#> 1     1 1970-01-01
#> 2     2 NA

# EW!
unclass(df_min$date)
#> [1]   0 Inf
is.na(df_min$date)
#> [1] FALSE FALSE

# this was actually what she wanted:
df_min <- df %>%
  group_by(g) %>%
  summarise(date = vec_min(date, na_rm = TRUE, empty = NA))

DavisVaughan avatar Feb 15 '22 14:02 DavisVaughan

vec_min(x, empty = 1L) would probably be useful for https://github.com/tidyverse/dplyr/issues/6167#issuecomment-1024533145

DavisVaughan avatar Jul 19 '22 21:07 DavisVaughan

Should vec_arg_min() be vec_min_loc()?

lionel- avatar Sep 12 '22 11:09 lionel-

Probably vec_locate_min() but yea i dont like the python style argmin naming scheme

DavisVaughan avatar Sep 12 '22 13:09 DavisVaughan

Another example where this would be helpful https://github.com/tidyverse/dplyr/issues/6703

DavisVaughan avatar Feb 09 '23 21:02 DavisVaughan