dplyr Feature request/Question: do not drop extra classes (and attributes) with functions group

We are building some packages on top of all the dplyr + dbplyr infrastructure (very grateful for that) and we build some classes like 'generated_cohort_set', 'cdm_reference', 'cdm_table', 'codelist' and so.

One problem that we are facing is that there are some functions (group_by, summarise, ...) that drop the classes (see reprex). I guess that this is on purpose, but wondering why and if it is something that could be considered to be implemented in the future?

here are the packages if you are curious: https://cran.r-project.org/web/packages/CDMConnector/index.html, https://cran.r-project.org/web/packages/DrugUtilisation/index.html, https://cran.r-project.org/web/packages/PatientProfiles/index.html, https://cran.r-project.org/web/packages/IncidencePrevalence/index.html ...)

x <- dplyr::tibble(a = 1)
class(x) <- c("my_class", class(x))
class(x)
#> [1] "my_class"   "tbl_df"     "tbl"        "data.frame"
x |> dplyr::mutate(b = 1) |> class()
#> [1] "my_class"   "tbl_df"     "tbl"        "data.frame"
x |> dplyr::group_by(a) |> class()
#> [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

^{Created on 2023-12-05 with reprex v2.0.2}

FYI @edward-burn @ablack3

Dec 05 '23 17:12 catalamarti

@catalamarti I was facing the same issue before and there is some documentation on how to extend tibbles here: https://dplyr.tidyverse.org/reference/dplyr_extending.html There they also state that for example dplyr::group_by and dplyr::ungroup do drop attributes and classes.

Unfortunately, if you have custom attributes, they are dropped even if they don't depend on the rows or columns, contrary to what is documented on the vignette.

I am currently writing a small post on how I ended up solving it and will comment it here once I am done.

Jan 16 '24 10:01 MSHelm

Just wondering if there is any potential resolution to this @hadley. In the link mentioned above, https://dplyr.tidyverse.org/reference/dplyr_extending.html it also says "These functions are a stop-gap measure" so I'm not sure whether to incorporate these in packages that depend on dplyr, or if the better approach (at least in the short-term) is to create method for every dplyr verb to handle the above situations?

Apr 18 '24 17:04 edward-burn

group_by() creates a fundamentally different type of data structure, and we have no way of knowing if it is compatible with your class, so we have to drop it. If you want to supported a grouped data frame structure then you can write an S3 method for group_by(), but it is typically easier to use something like mutate(.by =) as that will preserve your class and let you do the grouped operation, so you don't have to worry about the grouped_df class at all, it never exists in that workflow
summarise() similarly builds off the data from group_data(), which is always a bare tibble or bare data frame. In the same vein as group_by(), we don't know if the summarized table (which has a very different structure that the original one) is still compatible with your class, so we drop it. You'd also need an S3 method for this.

This is documented here https://dplyr.tidyverse.org/reference/dplyr_extending.html and here https://dplyr.tidyverse.org/reference/summarise.html#value

tsibble is an example of a tibble subclass that has support for custom grouped data frames and a custom summarise method, if you want to look at that. They are also a good example of how dplyr can't know if the result of summarise() is valid for your class or not. In some cases the result is still a tsibble, in other cases they return a bare tibble. https://github.com/tidyverts/tsibble

Apr 19 '24 15:04 DavisVaughan

@catalamarti Took some time to write my article due to a lot of things going on, but if it still helps you, here it is: https://www.bio-ai.org/blog/extending-tibbles/

May 06 '24 10:05 MSHelm

dplyr
dplyr copied to clipboard

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so

dplyr dplyr copied to clipboard

Feature request/Question: do not drop extra classes (and attributes) with functions group_by, summarise and so

dplyr
dplyr copied to clipboard