vctrs icon indicating copy to clipboard operation
vctrs copied to clipboard

Add coercion methods between `list()` and `list_of()`

Open lionel- opened this issue 3 years ago • 7 comments

Closes #1161

What is the richer type between list and list-of? Following the factor logic, the unconstrained type character covers more values than factor and is thus richer:

vec_c(factor("foo"), "bar")
#> [1] "foo" "bar"

However, factor has been implemented with lax coercibility:

vec_c(factor("foo"), factor("bar"))
#> [1] foo bar
#> Levels: foo bar

This means you get consistent results when a character is thrown into the mix:

vec_c(factor("foo"), "bar", factor("baz"))
#> [1] "foo" "bar" "baz"
vec_c("foo", factor("bar"), factor("baz"))
#> [1] "foo" "bar" "baz"
vec_c(factor("bar"), factor("bar"), "baz")
#> [1] "bar" "bar" "baz"

list_of is different, it has strict coercibility:

vec_c(list_of(1), list_of(""))
#> Error: Can't combine `..1` <double> and `..2` <character>.

so we can't consider list to be the richer type. That would mean list-of could easily lose its restrictions and also combining incompatible list_of values would no longer result in an error when a bare list is thrown into the mix:

# If we coerce list-of to list:
vec_c(list_of(1), list(TRUE), list_of(""))
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] TRUE
#>
#> [[3]]
#> [1] ""

For this reason, the richer type is list-of because its richer behaviour (strictness) is the important factor. By coercing lists to list-ofs, we get the expected error:

vec_c(list_of(1), list(TRUE), list_of(""))
#> Error: Can't combine `..1` <double> and `..3` <character>.

Note that if we generalise the base type fallback as planned in #1135, this means we'll lose this property of list-of with any foreign class that explicitly inherits from "list". This is my main concern with this fallback. Maybe we could introduce a generic vec_disable_fallback() that could be optionally implemented to disable it.

lionel- avatar Aug 27 '20 10:08 lionel-

My main concern is that we're leaning too strict here — is there some compelling reason that list_of() should be fundamentally stricter than factor()? I think your arguments are reasonable from a theoretical perspective, but I worry that pragmatically this is going to cause too much pain for R users, especially since the chances that they've heard of list_of is slim. I wonder if our principle should be that you'll keep the "list of"-ness if you're careful, but if you don't care about it, you'll just get a list.

Maybe we need some option so that we can choose to be strict while working on packages (so list_of is propagated everywhere we believe it should be), but when doing data analysis, you'll end up with simple lists if you don't take extra care?

hadley avatar Aug 27 '20 13:08 hadley

I would be fine with lax lists, as long as they are consistently lax. For instance vec_c(list_of(1), list_of("")) should then return a list. But what is the purpose for using a list_of() e.g. in tidyr? Is it to keep information about the common type as long as the list is homogeneous?

I like the idea of having strict and lax variants. Similarly, a strict factor subclass could also make sense.

lionel- avatar Aug 27 '20 13:08 lionel-

I think it's used in tidyr because it allows for some performance optimisations; so I think lax by default is probably the better bet.

hadley avatar Aug 27 '20 18:08 hadley

One reason for list-of in tidyr is consistent chop/unchop roundtripping https://github.com/tidyverse/tidyr/blob/a170e55890cbebe257acf83103003e9f4f123dff/R/chop.R#L20-L25

DavisVaughan avatar Aug 31 '20 15:08 DavisVaughan

Here is a dev tidyr example where the strictness currently comes into play. I think this current behavior has nice theoretical properties, but it might be too strict. I think I wouldn't mind if all 3 of these "worked" and gave the result we see from the first example (i.e. make list-of + list-of coercion lax).

library(tidyr)
library(vctrs)

# Mixed types while unnesting wider.
# This is fine because the lists aren't typed.
col1 <- list(
  list(a = 1:2),
  list(a = "x")
)
col2 <- list(
  list(b = c("x", "y")),
  list(b = 1)
)
df <- tibble(col1 = col1, col2 = col2)
df
#> # A tibble: 2 × 2
#>   col1             col2            
#>   <list>           <list>          
#> 1 <named list [1]> <named list [1]>
#> 2 <named list [1]> <named list [1]>

unnest_wider(df, c(col1, col2))
#> # A tibble: 2 × 2
#>   a         b        
#>   <list>    <list>   
#> 1 <int [2]> <chr [2]>
#> 2 <chr [1]> <dbl [1]>

# Mixed list-of types. 
# For `"a"`, can't combine list_of<integer> and list_of<character>
col1 <- list(
  list_of(a = 1:2),
  list_of(a = "x")
)
col2 <- list(
  list_of(b = c("x", "y")),
  list_of(b = 1)
)
df <- tibble(col1 = col1, col2 = col2)
df
#> # A tibble: 2 × 2
#>   col1            col2           
#>   <list>          <list>         
#> 1 <list<int> [1]> <list<chr> [1]>
#> 2 <list<chr> [1]> <list<dbl> [1]>

unnest_wider(df, c(col1, col2))
#> Error: Can't combine <integer> and <character>.

# The above is essentially what happens here:
col <- list(
  tibble(a = 1:2, b = c("x", "y")),
  tibble(a = "x", b = 1)
)
df <- tibble(col = col)
df
#> # A tibble: 2 × 1
#>   col             
#>   <list>          
#> 1 <tibble [2 × 2]>
#> 2 <tibble [1 × 2]>

unnest_wider(df, col)
#> Error: Can't combine <integer> and <character>.

Created on 2021-11-07 by the reprex package (v2.0.1)

DavisVaughan avatar Nov 07 '21 14:11 DavisVaughan

This also showed up in a revdep https://github.com/ffverse/ffscrapr/pull/344

The problem essentially boiled down to the fact that we can't currently combine list + list_of<integer> even if the list has compatible elements.

library(tidyr)
library(vctrs)

col <- list(
  list(a = 1:2),
  list_of(a = 1L)
)
df <- tibble(col = col)
df
#> # A tibble: 2 × 1
#>   col             
#>   <list>          
#> 1 <named list [1]>
#> 2 <list<int> [1]>

unnest_wider(df, col)
#> Error: Can't combine `..1$a` <list> and `..3$a` <list_of<integer>>.

This should "work", but, as Lionel mentioned, the direction depends on how strict the coercion ends up being:

  • If strict list_of<int> + list_of<chr> = error coercion, then list + list_of<int> = list_of<int> if possible, otherwise error
  • If lax list_of<int> + list_of<chr> = list coercion, then list + list_of<int> = list

DavisVaughan avatar Nov 07 '21 14:11 DavisVaughan

If strict, list + list_of = list_of if possible, otherwise error

As I've just realised with @mgirlich, this would mean that the common type is not the richer type, and then vec_cast() might fail when vec_ptype2() succeeds.

lionel- avatar Sep 01 '22 07:09 lionel-

and then vec_cast() might fail when vec_ptype2() succeeds.

oooh I don't like that if we can avoid it

DavisVaughan avatar Sep 02 '22 16:09 DavisVaughan

Looking at this again, I think I'd be fine with this kind of lax list:

list_of<int> + list_of<int> = list_of<int> # same inner type
list_of<int> + list_of<dbl> = list_of<dbl> # inner type coercion is possible
list_of<int> + list_of<chr> = list / possibly error  # inner type coercion is not possible - i.e. using vec_is_coercible
list_of<int> + list = list                 # use richer type of list        

I can't decide if I want list_of<int> + list_of<chr> to be an error or not. There is something nice about saying that this coercion is just delegated to whether or not the inner types have a common type or not. But, if it is an error, then you get non-symmetric results with things like this (as you mentioned at the beginning of this PR):

vec_c(list_of<int>, list, list_of<chr>) = list
vec_c(list_of<int>, list_of<chr>, list) = error

And that's pretty bad, so I don't think we can do that

DavisVaughan avatar Sep 02 '22 16:09 DavisVaughan

yup lax list-ofs don't have this problem because they coerce to the richer type. I'll finish this PR before release too.

lionel- avatar Sep 02 '22 16:09 lionel-

Highly related to changing tidyr nest() to return list-ofs. The current coercion rules are a little too strict for us to be able to do this, but it would be nice because it would allow roundtriping nest/unnest on empty tibbles https://github.com/tidyverse/tidyr/issues/1393

DavisVaughan avatar Sep 12 '22 15:09 DavisVaughan

Sounds good I'll get to it this week.

lionel- avatar Sep 12 '22 15:09 lionel-