dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Preserve custom factor metadata in group_by(..., .drop=FALSE)

Open gergness opened this issue 3 years ago • 0 comments

Before this PR, the metadata on a factor variable beyond the standard base constructs (the levels attribute, and the "factor" and possibly "ordered" classes) is lost.

I'm not sure how much this is worth going into, but I was hoping to extend the behavior of factors, especially the .drop behavior in https://github.com/gergness/srvyr/issues/134. This is admittedly niche, but it feels in the spirit of recent work in vctrs that allows extending base vector classes. Because of the recent advances in preserving metadata, it was surprising to me that this case doesn't.

Before the PR:

suppressPackageStartupMessages(library(dplyr))

df <- tibble(
  x = structure(1L, levels = "x", extra_info = "xyz", class = c("custom", "factor"))
)
nodrop <- df %>% group_by(x, .drop = FALSE) %>% group_data()

class(nodrop$x)
#> [1] "factor"
attributes(nodrop$x)
#> $levels
#> [1] "x"
#> 
#> $class
#> [1] "factor"

After this PR:

suppressPackageStartupMessages(library(dplyr))

df <- tibble(
  x = structure(1L, levels = "x", extra_info = "xyz", class = c("custom", "factor"))
)
nodrop <- df %>% group_by(x, .drop = FALSE) %>% group_data()

class(nodrop$x)
#> [1] "custom" "factor"
attributes(nodrop$x)
#> $levels
#> [1] "x"
#> 
#> $extra_info
#> [1] "xyz"
#> 
#> $class
#> [1] "custom" "factor"

gergness avatar Jan 25 '22 19:01 gergness

@gergness I appreciate what you are trying to do, but we are a bit uncomfortable committing to supporting this feature:

  • tidyr::complete() and tidyr::expand() also don't support factor subclasses

  • Just reattaching the attributes back onto the class after it has been "sliced" won't work in all cases. In some cases the attributes might need to be "updated" based on the new data that is there (i.e. if you have a vectorized attribute, similar to names) and there is no correct way to do that here

  • We aren't sure that base R factors/ordered factors are even something that should be subclassed at all. Typically a class needs to be designed with subclassing in mind from the start, and we aren't convinced base R has done that.

Again, we really appreciate the PR and hope you will continue to contribute in the future!

DavisVaughan avatar Aug 11 '22 19:08 DavisVaughan