vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

Peculiar interaction between `sapply()` and a (named) vctr

Open jennybc opened this issue 2 years ago • 4 comments

I recently re-implemented an existing S3 class using vctrs and have noticed that sapply() / lapply() don't necessarily transfer the names of the input to the output. (I was not actually relying on this in any meaningful way, mind you, but I noticed that a piece of documentation started to render differently.)

library(vctrs)

x <- new_vctr(1:3, class = "foofy")
names(x) <- letters[1:3]
x
#> <foofy[3]>
#> a b c 
#> 1 2 3

z <- 1:3
names(z) <- letters[1:3]
z
#> a b c 
#> 1 2 3

f <- function(x) "blah"

sapply(x, f)
#> [1] "blah" "blah" "blah"
sapply(z, f)
#>      a      b      c 
#> "blah" "blah" "blah"

Why do the names of x not transfer, while those for z do? What's weird is that it seems to vary by the function being applied. Here both have names. (Well, it's slightly less surprising now that I see the inner/outer name thing I show below.)

g <- function(x) x

sapply(x, g)
#> a b c 
#> 1 2 3
sapply(z, g)
#> a b c 
#> 1 2 3

Looking at the source of sapply(), these seem to be important differences. The vctrs-made vector returns FALSE for is.vector(), which means it gets sent through as.list() and as.list() also does something quite different for x vs. z, in terms of inner vs outer names.

is.vector(x)
#> [1] FALSE
is.vector(z)
#> [1] TRUE

as.list(x)
#> [[1]]
#> <foofy[1]>
#> a 
#> 1 
#> 
#> [[2]]
#> <foofy[1]>
#> b 
#> 2 
#> 
#> [[3]]
#> <foofy[1]>
#> c 
#> 3
as.list(z)
#> $a
#> [1] 1
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3

Created on 2021-07-19 by the reprex package (v2.0.0.9000)

jennybc avatar Jul 19 '21 17:07 jennybc

Interestingly (to me, at least), purrr::map_chr() does the right and expected thing (and that will be what I do):

library(vctrs)

x <- new_vctr(1:3, class = "foofy")
names(x) <- letters[1:3]
x
#> <foofy[3]>
#> a b c 
#> 1 2 3

f <- function(x) "blah"

sapply(x, f)
#> [1] "blah" "blah" "blah"
purrr::map_chr(x, f)
#>      a      b      c 
#> "blah" "blah" "blah"

Created on 2021-07-19 by the reprex package (v2.0.0.9000)

jennybc avatar Jul 19 '21 19:07 jennybc

The as.list() names handling behavior of the vctrs_vctr class is really inherited from vec_chop(). Because vec_chop() allows you to specify indices which can result in >1 observations per element of the resulting list, it doesn't make sense to always "promote" the inner names to outer names.

library(vctrs)

vctrs:::as.list.vctrs_vctr
#> function (x, ...) 
#> {
#>     out <- vec_chop(x)
#>     if (vec_is_list(x)) {
#>         out <- lapply(out, `[[`, 1)
#>     }
#>     out
#> }
#> <bytecode: 0x7f84270abe08>
#> <environment: namespace:vctrs>

x <- new_vctr(1:3, class = "foofy")
names(x) <- letters[1:3]
x
#> <foofy[3]>
#> a b c 
#> 1 2 3

# This is what `as.list()` calls
vec_chop(x)
#> [[1]]
#> <foofy[1]>
#> a 
#> 1 
#> 
#> [[2]]
#> <foofy[1]>
#> b 
#> 2 
#> 
#> [[3]]
#> <foofy[1]>
#> c 
#> 3

# But we could also specify indices, where it makes more sense to keep the
# inner names rather than "promote" them to outer names, which would not be defined
vec_chop(x, indices = list(1:2, 2:3, 3L))
#> [[1]]
#> <foofy[2]>
#> a b 
#> 1 2 
#> 
#> [[2]]
#> <foofy[2]>
#> b c 
#> 2 3 
#> 
#> [[3]]
#> <foofy[1]>
#> c 
#> 3

But maybe we should consider special casing the as.list.vctrs_vctr() method for compatibility with base as.list() behavior. Like:

as_list_vctrs_vctr <- function(x) {
  names <- vec_names(x)
  has_names <- !is.null(names)
  
  if (has_names) {
    x <- vec_set_names(x, NULL)
  }
  
  out <- vec_chop(x)
  
  if (has_names) {
    out <- vec_set_names(out, names)
  } 
  
  if (vec_is_list(x)) {
    out <- lapply(out, `[[`, 1)
  }
  
  out
}

as_list_vctrs_vctr(x)
#> $a
#> <foofy[1]>
#> [1] 1
#> 
#> $b
#> <foofy[1]>
#> [1] 2
#> 
#> $c
#> <foofy[1]>
#> [1] 3

DavisVaughan avatar Jul 20 '21 13:07 DavisVaughan

But maybe we should consider special casing the as.list.vctrs_vctr() method for compatibility with base as.list() behavior.

This feels a bit morally similar to #1186 about rbind(), in terms of making vctrs-made objects work / work better with base functions. In that #1186, I'm not sure if rbind() can be made correct and, if not, whether it's better to error.

Just based on what I see here, though, this seems like an easier call? It is surprising when sapply() doesn't work the way it usually does re: names, so if there's no downside, it seem like adjusting as.list.vctrs_vctr() would be nice.

jennybc avatar Jul 20 '21 14:07 jennybc

We plan to use vec_chop2() from #1226 once it is merged. This will fix the preservation of names.

lionel- avatar Jul 20 '21 21:07 lionel-