tibble
tibble copied to clipboard
Provide rbind() and cbind() methods
for R >= 4.0.0.
Closes #34.
After #890, do we still want to use vec_rbind() here? Should rbind() (and also dplyr::bind_rows() for that matter) call dplyr_row_slice() to support data frame classes with variable-length attributes?
Tried that, doesn't work: vec_rbind() does much more, e.g. adjustment of factor levels.
I still wonder what the desired semantics for dplyr_reconstruct() are if nrow(x) != nrow(to) .
Docs say "you need to carefully check and think about it", I guess we're fine here.
Just started the revdeps with
library(revdepcheck)
pkgs <- unique(c(cran_revdeps("tidyr"), cran_revdeps("tibble"), cran_revdeps("dplyr")))
cloud_check(revdep_packages = pkgs)
Thanks @hadley!
This shows that:
- The following packages are using
rbind()orcbind()in a way that is incompatible withvec_rbind()andvec_cbind(): - dat - designr - ExpertChoice - ggiraph - ggiraphExtra - groupr - heemod - neonstore - prettyglm - RKorAPClient - VarBundle - visR - vlda - wpa - All other failing packages might be incompatible with
vec_rbind()andvec_cbind(), but they also have failing tests. - We don't know anything yet about the packages we failed to check.
Do we have an easy way to send a message to all affected maintainers?
Before we notify the maintainers, I think it would be worth digging into 2-5 of the failures in more detail to see if there's a common underlying cause.
We can find the underlying cause by replacing rbind() with vctrs::vec_rbind() one by one in the packages, same with cbind().
egor rbind() failure has the following underlying reason:
library(tibble)
rbind(tibble(.b = integer()), tibble(a = 1, .b = 2))
#> # A tibble: 1 × 2
#> a .b
#> <dbl> <dbl>
#> 1 1 2
vctrs::vec_rbind(tibble(.b = integer()), tibble(a = 1, .b = 2))
#> # A tibble: 1 × 2
#> .b a
#> <dbl> <dbl>
#> 1 2 1
Created on 2021-10-20 by the reprex package (v2.0.1)
ypr checks for row names in its tests:
library(tibble)
row.names(rbind(tibble(a = 1), data.frame(a = c(x = 2))))
#> [1] "1" "x"
row.names(vctrs::vec_rbind(tibble(a = 1), data.frame(a = c(x = 2))))
#> [1] "...1" "x"
Created on 2021-10-20 by the reprex package (v2.0.1)
twoxtwo binds a vector and a tibble:
library(tibble)
base::rbind.data.frame(c("a", "b"), tibble(a = 1, b = 2))
#> # A tibble: 2 × 2
#> a b
#> <chr> <chr>
#> 1 a b
#> 2 1 2
vctrs::vec_rbind(c("a", "b"), tibble(a = 1, b = 2))
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 2 × 4
#> ...1 ...2 a b
#> <chr> <chr> <dbl> <dbl>
#> 1 a b NA NA
#> 2 <NA> <NA> 1 2
Created on 2021-10-20 by the reprex package (v2.0.1)
optimall calls cbind() with a vector and relies on auto-naming:
library(tibble)
x <- 2
base:::cbind.data.frame(tibble(a = 1), x)
#> a x
#> 1 1 2
vctrs::vec_cbind(tibble(a = 1), x)
#> New names:
#> * `` -> ...2
#> # A tibble: 1 × 2
#> a ...2
#> <dbl> <dbl>
#> 1 1 2
Created on 2021-10-20 by the reprex package (v2.0.1)
Especially the last issue looks like a serious incompatibility:
library(tibble)
x <- 2
base:::cbind.data.frame(tibble(a = 1), x)
#> a x
#> 1 1 2
vctrs::vec_cbind(tibble(a = 1), x)
#> New names:
#> * `` -> ...2
#> # A tibble: 1 × 2
#> a ...2
#> <dbl> <dbl>
#> 1 1 2
Created on 2021-10-20 by the reprex package (v2.0.1)
I suggest the following:
- separate the changes to
rbind()from the changes tocbind() - support fine-grained enabling and disabling of the new functionality
cbind()callstbl_cbind()which is provided by us and exported, if and only ifgetOption("tibble.cbind_base")is unset orFALSE- otherwise
cbind()forwards toNextMethod() - in
tbl_cbind()we check another option,"tibble.cbind_trace", which prints diagnostics on the input and output of these methods - similarly for
rbind()
- depending on the new revdepcheck results, we might need to take action and tweak
tbl_rbind()andtbl_cbind() - we schedule the change for tibble 4.0.0 and warn the maintainers well ahead of the release
This allows us and the maintainers to easier locate the source of a problem: we can gradually replace cbind() by tbl_cbind() and set option(tibble.cbind_base = TRUE), and/or set options(Tibble.cbind_trace = TRUE) to understand problems.
tbl_rbind(), tbl_cbind() and the options are implemented in this PR.
Do we still want to pursue this path?
The story with rbind() and cbind() is growing so complex, I'd almost prefer to just leave them as they are, and advise people to use vec_rbind() and vec_cbind() directly instead. Two reasons I feel this is reasonable:
- I imagine most data analysis focused users will already be using the tidyverse and
bind_rows/cols()directly, so they wouldn't need this - I imagine that package developers who might not want to depend on dplyr would be fine depending on vctrs, since tibble already Imports it.
I think the important point is the second one. We now at least have a good answer for people who want to be able to combine tibbles without relying on dplyr (i.e. use vctrs).
Seems like you could link to vec_rbind() in the tibble::add_row() documentation. i.e. "If you would like to row bind multiple data frames together, see..."
Similar for vec_cbind() and tibble::add_column().
Agreed, it's a lot of work for limited gain. Depends on how badly we want to support rbind() and cbind(), i.e. if we gain other advantages by overriding these methods for tibbles.