tidyr icon indicating copy to clipboard operation
tidyr copied to clipboard

Provide Wilkinson-Rogers formula interface to complete

Open hadley opened this issue 7 years ago • 2 comments

  • ~ x / y -> nesting(x, y)
  • ~ x * y -> crossing(x, y)
  • ~ x * (y / z) -> crossing(x, nesting(y, z))
  • ~ x / (y * z) -> nesting(x, crossing(y, z))

hadley avatar Apr 25 '18 12:04 hadley

Imagine you have a set of schools where every student has to take the same subject, but schools have different students and different subjects. Nested within school, you want every combination of student and subject. This would correspond to school / (student * subject) but isn't currently possible to implement with nesting() and crossing():

library(dplyr, warn.conflicts = FALSE)
library(tidyr)
df <- tribble(
  ~school, ~student_id, ~subject, ~grade,
  "A",     1,           "Maths",  "A",
  "A",     2,           "Science", "B",
  "A",     3,           "Maths", "C",
  "B",     1,           "Statistics", "A",
  "B",     2,           "Science", "A"
)
df %>% 
  group_by(school) %>% 
  expand(crossing(student_id, subject))
#> # A tibble: 10 x 3
#> # Groups:   school [2]
#>    school student_id subject   
#>    <chr>       <dbl> <chr>     
#>  1 A               1 Maths     
#>  2 A               1 Science   
#>  3 A               2 Maths     
#>  4 A               2 Science   
#>  5 A               3 Maths     
#>  6 A               3 Science   
#>  7 B               1 Science   
#>  8 B               1 Statistics
#>  9 B               2 Science   
#> 10 B               2 Statistics

Created on 2019-12-12 by the reprex package (v0.3.0)

That's because:

  • nesting() only every selects rows that exist. That means it's a filtering verb with selection semantics.
  • crossing() is wrapper around expand, and it makes sense for it to have action semantics since you want to create rows that might not already exist in the data.

hadley avatar Dec 12 '19 15:12 hadley

I'm convinced that using group_by() to complete or expand "within" a group is a useful part of this API that can't be represented with nesting() and crossing() alone

DavisVaughan avatar Jan 11 '22 19:01 DavisVaughan