REDCapTidieR
REDCapTidieR copied to clipboard
[FEATURE] Function to join data tibbles
Feature Request Description
Function to join together data tibbles that picks smart default keys to join on based on the events each tibble is mapped to.
Proposed Solution
Some possible APIs:
supertbl |>
join_data_tibble(
x = "tbl1",
y = "tbl2",
by = c("record_id", "redcap_event"), # optional! Inferred by default
type = "left"
)
# Return a tibble
supertbl |>
join_data_tibble(
x = "tbl1",
y = "tbl2",
by = c("record_id", "redcap_event"), # optional! Inferred by default
type = "left",
name = "my_new_tibble"
)
# Return a supertibble with a new tibble added
Additional Context
Data tibble types
API issues aside, I think the key here is that we'd need a way to determine for any two data tibbles which keys to use in a join by default.
Each instrument-event pair can have one of 3 types: Non Repeating, Repeating Separately, or Repeating Together. When I talk about the "type" of an instrument, I mean the aggregation of all the types of instrument-event pairs the instrument is part of. There are 7 options which I mocked up in an example REDCap.
library(REDCapTidieR)
library(tidyverse)
data <- read_redcap(Sys.getenv("REDCAP_URI"), "77C2C3A96E00246436A9E49974BD5674", allow_mixed_structure = TRUE)
extract_keys <- function(data) {
data |>
colnames() |>
intersect(c("record_id", "redcap_event", "redcap_form_instance", "redcap_event_instance")) |>
paste(collapse = ", ")
}
data |>
mutate(pks = map_chr(redcap_data, extract_keys)) |>
select(redcap_form_label, pks, structure)
#> A REDCapTidieR Supertibble with 7 instruments
#> redcap_form_label pks structure
#> <chr> <chr> <chr>
#>1 NR record_id, redcap_event nonrepeating
#>2 RS record_id, redcap_event, redcap_form_instance repeating
#>3 RT record_id, redcap_event, redcap_event_instance nonrepeating
#>4 NR, RS record_id, redcap_event, redcap_form_instance mixed
#>5 RS, RT record_id, redcap_event, redcap_form_instance mixed
#>6 NR, RT record_id, redcap_event, redcap_event_instance nonrepeating
#>7 NR, RS, RT record_id, redcap_event, redcap_form_instance mixed
"Type" here is slightly more specific than "structure". I think for joining any two non-mixed instruments the default should be to join on all common keys. The one complication I see is the NA
redcap_event_instance
in the "NR, RT" case.
data |>
extract_tibble("nr_rt")
#># A tibble: 2 × 5
#> record_id redcap_event redcap_event_instance nr_rt form_status_complete
#> <dbl> <chr> <dbl> <dbl> <fct>
#>1 1 non_repeating NA 1 Incomplete
#>2 1 repeating_together 1 1 Incomplete
Presumably if you join "NR, RT" to "NR, RT" you want the NA
s to join in the non repeating event which is a little non-standard.
Things get a little more complicated when considering the mixed structure instruments. For example, suppose you're joining "RS, RT" to "RS, RT". Presumably you'd want to join on all the keys for records in the RT event but on only record_id
and redcap_event
for the RS records.
If we wanted to cover all our bases we could try to enumerate the keys we'd expect to join on for all 28 possible combinations.
Missing keys
The above assumes that all instrument-event pairs have data. I noticed that when this isn't true the keys that come back for each data tibble aren't consistent. For example, in the "NR, RS" instrument redcap_form_instance
is only included if data is filled out for that instrument in the RS event. Not sure how much this matters for the joins but it might be an additional consideration.
Checklist
- [x] The issue is atomic
- [x] The issue description is documented
- [ ] The issue title describes the problem succinctly
- [ ] Developers are assigned to the issue
- [x] Labels are assigned to the issue