tidyr icon indicating copy to clipboard operation
tidyr copied to clipboard

more functions should be generic

Open mgirlich opened this issue 5 years ago • 7 comments

while bringing some functions from tidyr to dbplyr I noticed that a couple of tidyr functions should probably be generic:

  • [x] uncount()

the pivot helpers would make sense as the dbplyr implemention of pivot_*() uses them anyway

  • [ ] build_longer_spec()/build_wider_spec()
  • [ ] pivot_longer_spec()/pivot_wider_spec()

data frame column helpers would make sense as nested tables in SQL are becoming more common (e.g. see https://github.com/tidyverse/dbplyr/issues/515)

  • [ ] pack()/unpack()
  • [ ] unnest_longer()/unnest_wider()

list column helpers: not sure if they are needed but for completeness I also listed them here:

  • [ ] chop()/unchop()
  • [ ] hoist()

mgirlich avatar Dec 07 '20 09:12 mgirlich

Any work on this so far?

abalter avatar Feb 03 '21 17:02 abalter

@abalter if there was it would be noted in this issue.

hadley avatar Feb 17 '21 22:02 hadley

As we've refined these new-ish verbs in the 1.2.0 release (tweaking defaults, adding arguments), it has been extremely helpful that they aren't generic so we don't have to deal with S3 methods in other packages needing changes (unlike some issues in the generic pivot_longer() and pivot_wider() https://github.com/tidyverse/tidyr/issues/1291). I would encourage holding out as long as we can on this to really stabilize all of these verbs.

DavisVaughan avatar Dec 21 '21 18:12 DavisVaughan

@DavisVaughan I see your point. Though, for lazy tables it would make sense to have pivot_wider_spec() be generic. Alternatively, they might have to export something like dbplyr_pivot_wider_spec() temporarily.

The issue with pivot_wider() is that the result has to be collected in order to know which columns to generate. Therefore, pivot_wider() cannot be lazy. In pivot_wider_spec() this is not an issue because we already know the columns. There are some issues in dbplyr: https://github.com/tidyverse/dbplyr/issues/703 and https://github.com/tidyverse/dbplyr/issues/598

mgirlich avatar Jan 04 '22 10:01 mgirlich

Is there any chance nesting and pivoting will be available for lazy tables? Particularly database connections?

abalter avatar Mar 12 '23 07:03 abalter

@hadley -- Your comment sounded to me as though nesting is currently implemented. However, I can't get it to work. Am I missing something?

library(RPostgreSQL)
#> Loading required package: DBI
library(RMySQL)
#> 
#> Attaching package: 'RMySQL'
#> The following object is masked from 'package:RPostgreSQL':
#> 
#>     dbApply
library(dbplyr)
library(tidyverse)

con = DBI::dbConnect(RSQLite::SQLite(), ":memory:")

t = tibble(
  A = sample(letters[1:5], 20, replace=T),
  B=sample(1:5, 20, replace=T),
  C = rnorm(20)
)

lf = lazy_frame(
  A = sample(letters[1:5], 20, replace=T),
  B=sample(1:5, 20, replace=T),
  C = rnorm(20)
)

copy_to(con, t, "test", overwrite = T)

dbListTables(con)
#> [1] "sqlite_stat1" "sqlite_stat4" "test"

t %>%
  nest(data = -A) %>%
  slice_sample(n=2) %>%
  unnest(cols = data)
#> # A tibble: 6 x 3
#>   A         B      C
#>   <chr> <int>  <dbl>
#> 1 c         5  0.233
#> 2 c         1  1.62 
#> 3 c         1 -0.373
#> 4 c         4  0.275
#> 5 c         3 -1.47 
#> 6 b         5 -0.958

lf %>%
  nest(data = -A) %>%
  slice_sample(n=2) %>%
  unnest(cols = data)
#> Error in UseMethod("nest"): no applicable method for 'nest' applied to an object of class "c('tbl_TestConnection', 'tbl_lazy', 'tbl')"

tdb = tbl(con, "test")
tdb %>%
  nest(data = -A)
#> Error in UseMethod("nest"): no applicable method for 'nest' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

Created on 2023-03-12 with reprex v2.0.2

abalter avatar Mar 12 '23 09:03 abalter