vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

`anyDuplicated.vctrs_vctr` difference undocumented

Open jmbarbone opened this issue 3 years ago • 1 comments

This could be related to https://github.com/r-lib/vctrs/issues/180 and how anyDuplicated.vctrs_vctr was original conceived.

vctrs:::anyDuplicated.vctrs_vctr uses vec_duplicate_any which produces results similar to anyDuplicated() (as documented) but with the exported S3 method the difference isn't explicitly documented and may not be obvious.

I ran into this error using haven::labelled but traced it to the use of the vctrs_vctr class:

x <- c(3, 1, 2)
y <- c(x, 3)
anyDuplicated(x)
#> [1] 0
anyDuplicated(y)
#> [1] 4
x <- haven::labelled(c(3, 2, 1))
y <- c(x, 3)
anyDuplicated(x)
#> [1] FALSE
anyDuplicated(y)
#> [1] TRUE
# quick fix
anyDuplicated(unclass(x))
#> [1] 0
anyDuplicated(unclass(y))
#> [1] 4

Created on 2021-09-20 by the reprex package (v2.0.1)

jmbarbone avatar Sep 20 '21 17:09 jmbarbone

We should probably be more compatible with what anyDuplicated() is supposed to return. So something like this:

library(vctrs)
library(rlang)

any_duplicated_vctr <- function(x, 
                                incomparables = FALSE, 
                                fromLast = FALSE, 
                                ...) {
  if (!is_false(incomparables)) {
    warn("The <vctrs_vctr> method for `anyDuplicated()` does not respect `incomparables`.")
  }
  if (!is_bool(fromLast)) {
    abort("`fromLast` must be a single `TRUE` or `FALSE`.")
  }
  
  duplicates <- vec_duplicate_detect(x)
  duplicates <- which(duplicates)
  
  if (length(duplicates) == 0L) {
    return(0L)
  }
  
  # `vec_duplicate_detect()` returns first and all subsequent repeats,
  # but `anyDuplicated()` only returns the 2nd repeat
  if (fromLast) {
    i <- length(duplicates) - 1L
  } else {
    i <- 2L
  }
  
  duplicates[[i]]
}

x <- c(1, 1, 2, 2)

anyDuplicated(x)
#> [1] 2
any_duplicated_vctr(x)
#> [1] 2

anyDuplicated(x, fromLast = TRUE)
#> [1] 3
any_duplicated_vctr(x, fromLast = TRUE)
#> [1] 3

Created on 2021-09-20 by the reprex package (v2.0.0.9000)

Update: But this isn't quite right:

> anyDuplicated(c(1, 2, 1, 2))
[1] 3
> anyDuplicated.vctrs_vctr(c(1, 2, 1, 2))
[1] 2

DavisVaughan avatar Sep 20 '21 20:09 DavisVaughan