gt icon indicating copy to clipboard operation
gt copied to clipboard

Aggregation of column summary feature requests

Open jthomasmock opened this issue 3 years ago • 0 comments

Prework

Duplicates or related issues

  • #382 - add summary_column() to match summary_rows()
  • #632 - specify subset of columns to summarize ACROSS column in a rowwise() fashion
  • #690 - when calculating a summary of a specific column, reference another column
  • #952 - reference multiple columns in a summary - potentially generate a new column

Proposal

Provide an equivalent to gt::summary_rows() but for columns (ie dplyr::rowwise() operation). This is "possible" today but not as low-level as it could be and relies on a somewhat clunky gtExtras::duplicate_columns() to create a duplicate column and modify it in place, rather than a gt-native approach.

Components working:

  • grouped/ungrouped data column rowwise() operations
  • Basic support for any summarizing function by name or "name"
  • control placement of output column

Components that will still need to be hashed out:

  • Optionally affect summary row outputs (currently ignored)
  • summary_rows() to optionally reference column summaries
  • Convert to proper gt internals rather than gtExtras::duplicate_column() hack
  • Should it be added to last column of table with visual separation like summary_rows() double line
gt_sum_column() function definition
library(gt)
library(gtExtras)
library(dplyr,w=F)

gt_sum_column <- function(gt_object, columns, fn = sum, name = "sum_col", 
                          after = dplyr::last_col()){
  
  summary_fn <- function(all_df, sum_type){
    all_df %>%
      rowwise() %>%
      mutate(sum_col = do.call(sum_type, list(c_across({{ columns }})))) %>%
      ungroup() %>% 
      pull(sum_col)
    
  }
  
  res_col_names <- gt:::resolve_cols_c(
    expr = {{ columns }},
    data = gt_object
  )
  
  gt_object <- gtExtras::gt_duplicate_column(
    gt_object, 
    column = res_col_names[1], 
    after = {{ after }},
    dupe_name = name
    )
  
  gt_object[["_data"]] <-
    gt_object[["_data"]] %>%
    dplyr::rowwise() %>%
    dplyr::mutate({{ name }} := do.call(fn, list(dplyr::c_across({{ columns }})))) %>% 
    dplyr::ungroup()
  
  gt_object
  
}
base_gt <- dplyr::tibble(
  group = c(rep("A", 3), rep("B", 2)),
  a = 1:5,
  b = 5:1,
  c = seq(0.1, 0.5, length.out = 5)
) %>% 
  gt(groupname_col = "group") 

base_gt %>%
  gt_sum_column(c(a:c), fn = max, after = c)
base_gt %>%
  gt_sum_column(c(a:c), fn = min, after = c)
base_gt %>%
  gt_sum_column(c(a:c), fn = "mean", after = c)
base_gt %>%
  gt_sum_column(c(a:c), fn = "sum", after = c)

Created on 2022-06-17 by the reprex package (v2.0.1)

To help us read any code you include (optional) please try to follow the tidyverse style guide. The style_text() and style_file() functions from the styler package make it easier.

jthomasmock avatar Jun 17 '22 17:06 jthomasmock