REDCapTidieR icon indicating copy to clipboard operation
REDCapTidieR copied to clipboard

[FEATURE] extract_tibble should allow users to join specified tables

Open rsh52 opened this issue 2 years ago • 3 comments

Feature Request Description

In addition to extracting selected tibbles, extract_tibbles should allow users the option to join them as a singular tibble output instead of the list. As found in recent projects, the next logical step often times when using extract_tibbles is joining.

Proposed Solution

Prototyped logic is available in our internal Prodigy Reporter. The new argument (suggest: join_tibbles = TRUE/FALSE) should kick off join operations. Since we abstract some column names, i.e. form_status_complete, we need to account for duplicated colnames in the tibbles themselves.

# Load Libraries ===============================================================
library(REDCapTidieR)
library(tidyverse)
library(tidyselect)
library(rlang)

# tibble List Selection Function ===============================================
tibble_list_select <- function(supertibble, tbls) {
  tbls <- eval_select(data = supertibble, expr = enquo(tbls))
  supertibble[tbls]
}

# Join Operation ===============================================================
join_tibbles <- function(extracted_tibbles, record_id) {
  # First: compile all names related to tibbles
  # Second: Identify names that exist in multiple tibbles (not record_id)
  # Third: Append identified names with name of the tibble they belong to
  
  duplicate_colnames <- extracted_tibbles %>%
    map(names) %>%
    unlist() %>%
    tibble(name = .) %>%
    count(name) %>%
    # don't append table name to pk: infseq_id
    filter(n > 1 & name != record_id) %>% # <-- Need to functionally call out record_id in case of name change -->
    pull(name)
  
  extracted_tibbles <- map2(
    extracted_tibbles,
    names(extracted_tibbles),
    .f = function(df, df_name) {
      # [duplicate_col] -> [duplicate_col].[table_name]
      rename_with(
        df,
        .cols = any_of(duplicate_colnames),
        .fn = function(col) paste0(col, ".", df_name)
      )
    }
  )
  
  # Multi-left_join using reduce, filter for inputs resulting in include == TRUE
  out <- reduce(
    extracted_tibbles,
    dplyr::left_join,
    by = record_id # <-- Need to functionally update this -->
  )
  
  out
}

Here's how I envision this being implemented, but imagine the external functions as internal to extract_tibbles instead:

# Example ======================================================================
redcap_uri <- Sys.getenv("REDCAP_URI")
token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")

supertibble <- read_redcap(redcap_uri, token)

extracted_tibbles <- supertibble |>
  extract_tibbles() 

extracted_tibbles |> 
  tibble_list_select(tbls = c(contains("nonrepeat"), repeated)) |>
  join_tibbles(record_id = "record_id")

You should be able to copy and paste all of this into a script and use REDCapTidieR 0.2.0 to view the proposed output. Open to suggestions on naming conventions for identified duplicate columns (currently [duplicate_col].[table_name]).

Checklist

  • [x] The issue is atomic
  • [x] The issue description is documented
  • [x] The issue title describes the problem succinctly
  • [x] Developers are assigned to the issue
  • [x] Labels are assigned to the issue

rsh52 avatar Dec 12 '22 19:12 rsh52