vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

`list_ungroup()` to invert `vec_group_loc()`?

Open DavisVaughan opened this issue 1 year ago • 1 comments

It is currently slightly awkward to "undo" a vec_group_loc() or vec_split() call. You can do it with list_unchop(), but it requires splitting the key first.

library(vctrs)
set.seed(123)

x <- sample(1:4, size = 20, replace = TRUE)
x
#>  [1] 3 3 3 2 3 2 2 2 3 1 4 2 2 1 2 3 4 1 3 3

# Or `vec_split()` potentially, if we are splitting by something else
locs <- vec_group_loc(x)
locs
#>   key                       loc
#> 1   3 1, 2, 3, 5, 9, 16, 19, 20
#> 2   2    4, 6, 7, 8, 12, 13, 15
#> 3   1                10, 14, 18
#> 4   4                    11, 17

# Chopping here is awkward
list_unchop(vec_chop(locs$key), indices = locs$loc)
#>  [1] 3 3 3 2 3 2 2 2 3 1 4 2 2 1 2 3 4 1 3 3

We could reintroduce vec_unchop(<vector>, <list-of-indices>) to do this, but I think the "missing piece" is really a way to flatten out that loc column from a list of location vectors that point into the original x to a single location vector that points into the new key.

# Should be fairly fast to build this at the C level?
# Probably some checks on `x` to make sure every element is an integer vector
# and that no element exceeds `sum(list_sizes(x))`. May also want to remove
# `0` values ahead of time?
list_ungroup <- function(x) {
  out <- vec_init(integer(), n = sum(list_sizes(x)))
  
  for (i in seq_along(x)) {
    out <- vec_assign(out, x[[i]], i)
  }
  
  out
}

list_ungroup(locs$loc)
#>  [1] 1 1 1 2 1 2 2 2 1 3 4 2 2 3 2 1 4 3 1 1

vec_slice(locs$key, list_ungroup(locs$loc))
#>  [1] 3 3 3 2 3 2 2 2 3 1 4 2 2 1 2 3 4 1 3 3

DavisVaughan avatar Jun 23 '23 13:06 DavisVaughan

@DavisVaughan list_ungroup seems very specifically about reversing vec_group_loc. ~~What if instead of trying to reverse vec_group_loc with a new function, a third column was built into the result of vec_group_loc which could be flattened directly (e.g. id to match the terminology of vec_group_id)? (This would also resolve what I was looking for in #1857).~~

I don't think there's a sensible way to include vec_group_id in the vec_group_loc data frame since the structure is inherently different.

orgadish avatar Oct 08 '23 20:10 orgadish