more functions should be generic
while bringing some functions from tidyr to dbplyr I noticed that a couple of tidyr functions should probably be generic:
- [x]
uncount()
the pivot helpers would make sense as the dbplyr implemention of pivot_*() uses them anyway
- [ ]
build_longer_spec()/build_wider_spec() - [ ]
pivot_longer_spec()/pivot_wider_spec()
data frame column helpers would make sense as nested tables in SQL are becoming more common (e.g. see https://github.com/tidyverse/dbplyr/issues/515)
- [ ]
pack()/unpack() - [ ]
unnest_longer()/unnest_wider()
list column helpers: not sure if they are needed but for completeness I also listed them here:
- [ ]
chop()/unchop() - [ ]
hoist()
Any work on this so far?
@abalter if there was it would be noted in this issue.
As we've refined these new-ish verbs in the 1.2.0 release (tweaking defaults, adding arguments), it has been extremely helpful that they aren't generic so we don't have to deal with S3 methods in other packages needing changes (unlike some issues in the generic pivot_longer() and pivot_wider() https://github.com/tidyverse/tidyr/issues/1291). I would encourage holding out as long as we can on this to really stabilize all of these verbs.
@DavisVaughan I see your point. Though, for lazy tables it would make sense to have pivot_wider_spec() be generic. Alternatively, they might have to export something like dbplyr_pivot_wider_spec() temporarily.
The issue with pivot_wider() is that the result has to be collected in order to know which columns to generate. Therefore, pivot_wider() cannot be lazy. In pivot_wider_spec() this is not an issue because we already know the columns. There are some issues in dbplyr: https://github.com/tidyverse/dbplyr/issues/703 and https://github.com/tidyverse/dbplyr/issues/598
Is there any chance nesting and pivoting will be available for lazy tables? Particularly database connections?
@hadley -- Your comment sounded to me as though nesting is currently implemented. However, I can't get it to work. Am I missing something?
library(RPostgreSQL)
#> Loading required package: DBI
library(RMySQL)
#>
#> Attaching package: 'RMySQL'
#> The following object is masked from 'package:RPostgreSQL':
#>
#> dbApply
library(dbplyr)
library(tidyverse)
con = DBI::dbConnect(RSQLite::SQLite(), ":memory:")
t = tibble(
A = sample(letters[1:5], 20, replace=T),
B=sample(1:5, 20, replace=T),
C = rnorm(20)
)
lf = lazy_frame(
A = sample(letters[1:5], 20, replace=T),
B=sample(1:5, 20, replace=T),
C = rnorm(20)
)
copy_to(con, t, "test", overwrite = T)
dbListTables(con)
#> [1] "sqlite_stat1" "sqlite_stat4" "test"
t %>%
nest(data = -A) %>%
slice_sample(n=2) %>%
unnest(cols = data)
#> # A tibble: 6 x 3
#> A B C
#> <chr> <int> <dbl>
#> 1 c 5 0.233
#> 2 c 1 1.62
#> 3 c 1 -0.373
#> 4 c 4 0.275
#> 5 c 3 -1.47
#> 6 b 5 -0.958
lf %>%
nest(data = -A) %>%
slice_sample(n=2) %>%
unnest(cols = data)
#> Error in UseMethod("nest"): no applicable method for 'nest' applied to an object of class "c('tbl_TestConnection', 'tbl_lazy', 'tbl')"
tdb = tbl(con, "test")
tdb %>%
nest(data = -A)
#> Error in UseMethod("nest"): no applicable method for 'nest' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
Created on 2023-03-12 with reprex v2.0.2