tidyr
tidyr copied to clipboard
Restore data frames generically
See existing work in #812, and see below for a list of functions that we needed to consider, and some thoughts on what form of genericity is needed. Goal is to make sure that data frame extensions return reasonable results in the absence of specific methods (and to make sure all needed functions are generic so that they can be extended when needed).
chop, unchop
pack, unpack
nest, unnest
separate, extract = append_df
hoist = append_df
complete = full_join + replace_na
drop_na = dplyr_row_slice
separate_rows = str_split + unchop
uncount = dplyr_row_slice + optional column removal
replace_na = dplyr_col_modify
expand = dplyr_reconstruct
pivot_longer = dplyr_reconstruct
pivot_wider = dplyr_reconstruct
# don't need to update superseded functions
gather, spread
nest_legacy, unnest_legacy
Need to consider the sticky column case, like panelr.
Ideally we'd be like dplyr, and just forcibly make the assumption that [ with 1 argument i is going to return a data frame with length length(i).
I have a feeling that we are going to have to say: if you have sticky columns and a sticky [ method, you'll need to implement an S3 method for this generic specific to your package. Otherwise it should just work.
That would break packages like this (with sticky cols) until they add a method for these operations. But it isn't like it worked right to begin with.
library(tidyr)
library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
wages
#> # Panel data: 4,165 × 14
#> # entities: id [595]
#> # wave variable: t [1, 2, 3, ... (7 waves)]
#> id t exp wks occ ind south smsa ms fem union ed blk
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 3 32 0 0 1 0 1 0 0 9 0
#> 2 1 2 4 43 0 0 1 0 1 0 0 9 0
#> 3 1 3 5 40 0 0 1 0 1 0 0 9 0
#> 4 1 4 6 39 0 0 1 0 1 0 0 9 0
#> 5 1 5 7 42 0 1 1 0 1 0 0 9 0
#> 6 1 6 8 35 0 1 1 0 1 0 0 9 0
#> 7 1 7 9 32 0 1 1 0 1 0 0 9 0
#> 8 2 1 30 34 1 0 0 0 1 0 0 11 0
#> 9 2 2 31 27 1 0 0 0 1 0 0 11 0
#> 10 2 3 32 33 1 1 0 0 1 0 1 11 0
#> # … with 4,155 more rows, and 1 more variable: lwage <dbl>
# Sticky cols
wages <- wages["exp"]
wages
#> # Panel data: 4,165 × 3
#> # entities: id [595]
#> # wave variable: t [1, 2, 3, ... (7 waves)]
#> id t exp
#> <fct> <dbl> <dbl>
#> 1 1 1 3
#> 2 1 2 4
#> 3 1 3 5
#> 4 1 4 6
#> 5 1 5 7
#> 6 1 6 8
#> 7 1 7 9
#> 8 2 1 30
#> 9 2 2 31
#> 10 2 3 32
#> # … with 4,155 more rows
# Meaning they come along for the ride here
chop(wages, exp)
#> New names:
#> * id -> id...1
#> * t -> t...2
#> * id -> id...3
#> * t -> t...4
#> # A tibble: 4,165 × 5
#> id...1 t...2 id...3 t...4 exp
#> <fct> <dbl> <list<fct>> <list<dbl>> <list<dbl>>
#> 1 1 1 [1] [1] [1]
#> 2 1 2 [1] [1] [1]
#> 3 1 3 [1] [1] [1]
#> 4 1 4 [1] [1] [1]
#> 5 1 5 [1] [1] [1]
#> 6 1 6 [1] [1] [1]
#> 7 1 7 [1] [1] [1]
#> 8 2 1 [1] [1] [1]
#> 9 2 2 [1] [1] [1]
#> 10 2 3 [1] [1] [1]
#> # … with 4,155 more rows
# Genericity doesn't realllly work right
# In theory this should be a panel data frame, but reconstruct_tibble()
# took over since it inherits from grouped_df
tidyr::pack(wages, data = exp)
#> # A tibble: 4,165 × 3
#> # Groups: id [595]
#> id t data$id $t $exp
#> <fct> <dbl> <fct> <dbl> <dbl>
#> 1 1 1 1 1 3
#> 2 1 2 1 2 4
#> 3 1 3 1 3 5
#> 4 1 4 1 4 6
#> 5 1 5 1 5 7
#> 6 1 6 1 6 8
#> 7 1 7 1 7 9
#> 8 2 1 2 1 30
#> 9 2 2 2 2 31
#> 10 2 3 2 3 32
#> # … with 4,155 more rows
Created on 2021-11-12 by the reprex package (v2.0.1)
Let's kick this down the road again.