Wishlist-for-R
Wishlist-for-R copied to clipboard
Control over `NA` equality in `base::rle()`
An option to treat runs of NA
s as equal so they are counted as a single run. Currently, rle()
treats consecutive NA
as separate runs.
x <- c(1, NA, NA, 3, 3, 3)
rle(x)
#> Run Length Encoding
#> lengths: int [1:4] 1 1 1 3
#> values : num [1:4] 1 NA NA 3
While I think this behaviour is technically correct, practically you'd want to use run-length encoding for compression or segmentation in which case individual runs of NA
s are useless.
I'd like something like the following where one can choose to ignore that NA
s are not equal:
rle2 <- function(x, na.equal = FALSE) {
if (!is.vector(x) && !is.list(x))
stop("'x' must be a vector of an atomic type")
n <- length(x)
if (n == 0L)
return(structure(list(lengths = integer(), values = x), class = "rle"))
if (isTRUE(na.equal)) { # changed
ux <- unique(x) #
x <- match(x, ux) #
} #
y <- x[-1L] != x[-n]
i <- c(which(y | is.na(y)), n)
values <- x[i]
if (isTRUE(na.equal)) { # changed
values <- ux[values] #
} #
structure(list(lengths = diff(c(0L, i)), values = values), class = "rle")
}
rle2(x, na.equal = TRUE)
#> Run Length Encoding
#> lengths: int [1:3] 1 2 3
#> values : num [1:3] 1 NA 3
The vctrs::vec_unrep()
function also treats NA
s this way:
vctrs::vec_unrep(x)
#> key times
#> 1 1 1
#> 2 NA 2
#> 3 3 3