Wishlist-for-R icon indicating copy to clipboard operation
Wishlist-for-R copied to clipboard

Control over `NA` equality in `base::rle()`

Open teunbrand opened this issue 6 months ago • 0 comments

An option to treat runs of NAs as equal so they are counted as a single run. Currently, rle() treats consecutive NA as separate runs.

x <- c(1, NA, NA, 3, 3, 3)
rle(x)
#> Run Length Encoding
#>   lengths: int [1:4] 1 1 1 3
#>   values : num [1:4] 1 NA NA 3

While I think this behaviour is technically correct, practically you'd want to use run-length encoding for compression or segmentation in which case individual runs of NAs are useless. I'd like something like the following where one can choose to ignore that NAs are not equal:

rle2 <- function(x, na.equal = FALSE) {
  if (!is.vector(x) && !is.list(x)) 
    stop("'x' must be a vector of an atomic type")
  n <- length(x)
  if (n == 0L) 
    return(structure(list(lengths = integer(), values = x), class = "rle"))
  if (isTRUE(na.equal)) { # changed
    ux <- unique(x)       #
    x <- match(x, ux)     #
  }                       #
  y <- x[-1L] != x[-n]
  i <- c(which(y | is.na(y)), n)
  values <- x[i]
  if (isTRUE(na.equal)) { # changed
    values <- ux[values]  #
  }                       #
  structure(list(lengths = diff(c(0L, i)), values = values), class = "rle")
}
rle2(x, na.equal = TRUE)
#> Run Length Encoding
#>   lengths: int [1:3] 1 2 3
#>   values : num [1:3] 1 NA 3

The vctrs::vec_unrep() function also treats NAs this way:

vctrs::vec_unrep(x)
#>   key times
#> 1   1     1
#> 2  NA     2
#> 3   3     3

teunbrand avatar Dec 13 '23 16:12 teunbrand