REDCapTidieR
REDCapTidieR copied to clipboard
[FEATURE] extract_tibble should allow users to join specified tables
Feature Request Description
In addition to extracting selected tibble
s, extract_tibbles
should allow users the option to join them as a singular tibble
output instead of the list. As found in recent projects, the next logical step often times when using extract_tibbles
is joining.
Proposed Solution
Prototyped logic is available in our internal Prodigy Reporter. The new argument (suggest: join_tibbles = TRUE/FALSE
) should kick off join operations. Since we abstract some column names, i.e. form_status_complete
, we need to account for duplicated colnames in the tibble
s themselves.
# Load Libraries ===============================================================
library(REDCapTidieR)
library(tidyverse)
library(tidyselect)
library(rlang)
# tibble List Selection Function ===============================================
tibble_list_select <- function(supertibble, tbls) {
tbls <- eval_select(data = supertibble, expr = enquo(tbls))
supertibble[tbls]
}
# Join Operation ===============================================================
join_tibbles <- function(extracted_tibbles, record_id) {
# First: compile all names related to tibbles
# Second: Identify names that exist in multiple tibbles (not record_id)
# Third: Append identified names with name of the tibble they belong to
duplicate_colnames <- extracted_tibbles %>%
map(names) %>%
unlist() %>%
tibble(name = .) %>%
count(name) %>%
# don't append table name to pk: infseq_id
filter(n > 1 & name != record_id) %>% # <-- Need to functionally call out record_id in case of name change -->
pull(name)
extracted_tibbles <- map2(
extracted_tibbles,
names(extracted_tibbles),
.f = function(df, df_name) {
# [duplicate_col] -> [duplicate_col].[table_name]
rename_with(
df,
.cols = any_of(duplicate_colnames),
.fn = function(col) paste0(col, ".", df_name)
)
}
)
# Multi-left_join using reduce, filter for inputs resulting in include == TRUE
out <- reduce(
extracted_tibbles,
dplyr::left_join,
by = record_id # <-- Need to functionally update this -->
)
out
}
Here's how I envision this being implemented, but imagine the external functions as internal to extract_tibbles
instead:
# Example ======================================================================
redcap_uri <- Sys.getenv("REDCAP_URI")
token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API")
supertibble <- read_redcap(redcap_uri, token)
extracted_tibbles <- supertibble |>
extract_tibbles()
extracted_tibbles |>
tibble_list_select(tbls = c(contains("nonrepeat"), repeated)) |>
join_tibbles(record_id = "record_id")
You should be able to copy and paste all of this into a script and use REDCapTidieR
0.2.0 to view the proposed output. Open to suggestions on naming conventions for identified duplicate columns (currently [duplicate_col].[table_name]
).
Checklist
- [x] The issue is atomic
- [x] The issue description is documented
- [x] The issue title describes the problem succinctly
- [x] Developers are assigned to the issue
- [x] Labels are assigned to the issue