REDCapTidieR icon indicating copy to clipboard operation
REDCapTidieR copied to clipboard

[BUG] Misleading Mixed Data Structure Outputs

Open rsh52 opened this issue 5 months ago • 3 comments

Expected Behavior

When joining mixed data structures with read_redcap(..., allow_mixed_structure = TRUE), we expect tables to be made with meaningful, unique primary keys.

Current Behavior

During development of #199, we found that joining tables that both "repeat together" (i.e. as events, RT) and "repeat separate" (i.e. as instruments, RS) don't provide the appropriate primary keys to allow for distinction of RS rows during joins.

How to Reproduce the Bug:

Using the REDCap for a single record as set up below: image

Currently read_redcap() gives us the following output:

> sprtbl$redcap_data
[[1]]
# A tibble: 4 × 5
  record_id redcap_event       redcap_form_instance mixed_structure_1 form_status_complete
      <dbl> <chr>                             <dbl> <chr>             <fct>               
1         1 repeating_together                    1 RT1               Complete            
2         1 repeating_together                    2 RT2               Complete            
3         1 repeating_separate                    1 RS1               Complete            
4         1 repeating_separate                    2 RS2               Complete            

[[2]]
# A tibble: 3 × 5
  record_id redcap_event       redcap_form_instance mixed_structure_2 form_status_complete
      <dbl> <chr>                             <dbl> <chr>             <fct>               
1         1 repeating_together                    1 RT1               Complete            
2         1 repeating_together                    2 RT2               Complete            
3         1 repeating_separate                    1 RS1               Complete   

Using join_data_tibbles() with a "full join" we get this:

join_data_tibbles(sprtbl, x = "mixed_structure_1", y = "mixed_structure_2", type = "full")
# A tibble: 4 × 8
  record_id redcap_event       redcap_form_instance mixed_structure_1 redcap_event_instance mixed_structure_2 form_status_complete.x form_status_complete.y
      <dbl> <chr>                             <dbl> <chr>             <lgl>                 <chr>             <fct>                  <fct>                 
1         1 repeating_together                    1 RT1               NA                    RT1               Complete               Complete              
2         1 repeating_together                    2 RT2               NA                    RT2               Complete               Complete              
3         1 repeating_separate                    1 RS1               NA                    RS1               Complete               Complete              
4         1 repeating_separate                    2 RS2               NA                    NA                Complete               NA        

The issue here is that in row 3, data for mixed_structure_1 and mixed_structure_2 should exist on separate rows because they are RS instances. As read_redcap() is currently set up, it is impossible to separate these because the primary keys for both are identical (record_id, redcap_event, redcap_form_instance). This is a by product of how we decided to mix redcap_form_instances meaning between repeat events and instruments

Solution Proposal

To fix this we will need to do the following:

  • Add the repeating event type to the redcap_events column of the supertibble. This will need to identify if an event is a RS/RT for us to reference in other functions like join_data_tibbles()
  • Revise how redcap_form_instance is used in these examples for RS rows. This may involve revising how redcap_form_instance/redcap_event_instance is defined but boils down to needing an additional primary key to help identify separate repeating instances. @ezraporter after making an example I think redcap_form_instance/redcap_event_instance might still not address this fully since these share the same event information. I think instead we need to identify RS data and then give it an additional column with the form name it came from. This might be able to get shifted back to the join function's responsibilities.
> tibble::tribble(
+   ~"record_id", ~"redcap_event", ~"redcap_form_instance", ~"redcap_event_instance", ~"extra_rs_key", ~"mixed_structure_1", ~"mixed_structure_2", ~"form_status_complete.x", ~"form_status_complete.y",
+   1, "repeat_together", 1, NA, NA, "RT1", "RT1", "Complete", "Complete",
+   1, "repeat_together", 2, NA, NA, "RT2", "RT2", "Complete", "Complete",
+   1, "repeat_separate", 1, NA, "mixed_structure_1", "RS1", NA, "Complete", NA,
+   1, "repeat_separate", 1, NA, "mixed_structure_2", NA, "RS1", NA, "Complete",
+   1, "repeat_separate", 2, NA, "mixed_structure_1", "RS2", NA, "Complete", NA
+   )
# A tibble: 5 × 9
  record_id redcap_event    redcap_form_instance redcap_event_instance extra_rs_key      mixed_structure_1 mixed_structure_2 form_status_complete.x form_status_complete.y
      <dbl> <chr>                          <dbl> <lgl>                 <chr>             <chr>             <chr>             <chr>                  <chr>                 
1         1 repeat_together                    1 NA                    NA                RT1               RT1               Complete               Complete              
2         1 repeat_together                    2 NA                    NA                RT2               RT2               Complete               Complete              
3         1 repeat_separate                    1 NA                    mixed_structure_1 RS1               NA                Complete               NA                    
4         1 repeat_separate                    1 NA                    mixed_structure_2 NA                RS1               NA                     Complete              
5         1 repeat_separate                    2 NA                    mixed_structure_1 RS2               NA                Complete               NA      

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

  • [x] The issue is atomic
  • [x] The issue description is documented
  • [x] The issue title describes the problem succinctly
  • [x] Developers are assigned to the issue
  • [x] Labels are assigned to the issue

rsh52 avatar Sep 17 '24 19:09 rsh52