dplyr icon indicating copy to clipboard operation
dplyr copied to clipboard

Request: Add native pipeline/verb logging option similar to `tidylog` package.

Open orgadish opened this issue 1 year ago • 0 comments

The tidylog package is an amazing supplement to the dplyr and tidyr packages. It allows immediate tracking/debugging of a pipeline by giving high-level information on the results of each step in the pipeline (see below). It's especially useful in tracking joins, which I know was an important part of the error and warning handling made to the join functions in dplyr 1.1.0+.

The silly example below shows what messages tidylog provides about the process.

I always use tidylog by default and I would love to also teach it to my students. But I don't because it works by masking regular tidyverse verbs with a logging version, which eliminates use of auto-complete. In order to implement auto-complete, the maintainer of the tidylog would have to constantly keep up with tidyverse updates even though they are completely separate (elbersb/tidylog#56)

It would be great if this kind of functionality were implemented natively and maintained by the relevant teams so that updates to APIs (e.g. new arguments) would also involve updating the log structure.

suppressMessages({
  library(tidyverse)
  library(tidylog)
})

tbl1 <- dplyr::starwars |> 
  filter(height > 90) |> 
  select(name, height, mass, hair_color) |> 
  summarize(
    across(c(height, mass), mean),
    name_list = list(name),
    .by = hair_color
  )
#> filter: removed 9 rows (10%), 78 rows remaining
#> select: dropped 10 variables (skin_color, eye_color, birth_year, sex, gender, …)
#> summarize: now 12 rows and 4 columns, ungrouped
  
tbl2 <- dplyr::starwars |>
  select(hair_color, films) |> 
  unnest_longer(films) |> 
  summarize(
    films_list = list(films),
    .by = hair_color
  )
#> select: dropped 12 variables (name, height, mass, skin_color, eye_color, …)
#> summarize: now 12 rows and 2 columns, ungrouped

joined <- tbl1 |> 
  full_join(
    tbl2,
    by = join_by(hair_color)
  )
#> full_join: added one column (films_list)
#>            > rows only in tbl1   0
#>            > rows only in tbl2   0
#>            > matched rows       12
#>            >                   ====
#>            > rows total         12

Created on 2024-02-13 with reprex v2.0.2

orgadish avatar Feb 13 '24 23:02 orgadish