REDCapTidieR icon indicating copy to clipboard operation
REDCapTidieR copied to clipboard

join_data_tibbles, Enable Repeat Event Type Supertibble Definion

Open rsh52 opened this issue 5 months ago • 0 comments

Description

This PR seeks to provide a new join_data_tibbles() function. To support this with regards to complex mixed data structures, some changes needed to be made to how we process longitudinal databases.

This also addresses a few bugs, one minor (documented below), and one major where we are mistakenly reporting mixed data structure outputs.

I will be posting comments on the code itself below to explain some of the changes.

The Missing Data Issue

Because we weren't making use of redcap_event_instance correctly, we were not accurately reporting all of the data associated with each REDCap event. See below from the Mixed Structure REDCap for reference:

image

And here is what currently comes out of the REDCapTidieR data tibbles from main:

Current Output with Missing Data
> sprtbl_mixed
# A REDCapTidieR Supertibble with 3 instruments
  redcap_form_name     redcap_form_label    redcap_data redcap_metadata   redcap_events structure data_rows data_cols
  <chr>                <chr>                <list>      <list>            <list>        <chr>         <int>     <int>
1 nonrepeat_form       Nonrepeat Form       <tibble>    <tibble [2 × 17]> <tibble>      nonrepea…         4         5
2 repeat_form          Repeat Form          <tibble>    <tibble [2 × 17]> <tibble>      mixed             2         5
3 mixed_structure_form Mixed Structure Form <tibble>    <tibble [2 × 17]> <tibble>      mixed             3         5
# ℹ 3 more variables: data_size <lbstr_by>, data_na_pct <formttbl>, form_complete_pct <formttbl>
> sprtbl_mixed$redcap_data
[[1]]
# A tibble: 4 × 5
  record_id redcap_event       redcap_event_instance nonrepeat_1 form_status_complete
      <dbl> <chr>                              <dbl> <chr>       <fct>               
1         1 non_repeating                         NA Nonrepeat 1 Incomplete          
2         1 repeating_separate                    NA Nonrepeat 2 Incomplete          
3         1 repeating_together                     1 A           Incomplete          
4         1 repeating_together                     2 B           Incomplete          

[[2]]
# A tibble: 2 × 5
  record_id redcap_event  redcap_form_instance repeat_1 form_status_complete
      <dbl> <chr>                        <dbl> <chr>    <fct>               
1         1 non_repeating                    1 Repeat 1 Incomplete          
2         1 non_repeating                    2 Repeat 2 Incomplete          

[[3]]
# A tibble: 3 × 5
  record_id redcap_event       redcap_form_instance mixed_structure_1 form_status_complete
      <dbl> <chr>                             <dbl> <chr>             <fct>               
1         1 non_repeating                         1 Mixed Nonrepeat 1 Incomplete          
2         1 repeating_separate                    1 Mixed Repeat 1    Incomplete          
3         1 repeating_separate                    2 Mixed Repeat 2    Incomplete     

In the second data tibble (the repeat form), we should expect to see 4 entries, 2 for the non repeating event due to 2 repeating form instances, and 2 for the repeating together event entries.

In the third data tibble (the mixed structure form), we should expect to see 5 entries and are again missing the 2 repeating together event entries.

Here is the output with the proposed code changes:

Proposed Output with Missing Data Added
> sprtbl_mixed
# A REDCapTidieR Supertibble with 3 instruments
  redcap_form_name   redcap_form_label redcap_data redcap_metadata redcap_events structure data_rows data_cols data_size data_na_pct form_complete_pct
  <chr>              <chr>             <list>      <list>          <list>        <chr>         <int>     <int> <lbstr_b> <formttbl>  <formttbl>       
1 nonrepeat_form     Nonrepeat Form    <tibble>    <tibble>        <tibble>      nonrepea…         4         5 2.59 kB   0%          0%               
2 repeat_form        Repeat Form       <tibble>    <tibble>        <tibble>      mixed             4         6 2.67 kB   0%          0%               
3 mixed_structure_f… Mixed Structure … <tibble>    <tibble>        <tibble>      mixed             5         6 2.94 kB   0%          0%               
> sprtbl_mixed$redcap_data
[[1]]
# A tibble: 4 × 5
  record_id redcap_event       redcap_event_instance nonrepeat_1 form_status_complete
      <dbl> <chr>                              <dbl> <chr>       <fct>               
1         1 non_repeating                         NA Nonrepeat 1 Incomplete          
2         1 repeating_separate                    NA Nonrepeat 2 Incomplete          
3         1 repeating_together                     1 A           Incomplete          
4         1 repeating_together                     2 B           Incomplete          

[[2]]
# A tibble: 4 × 6
  record_id redcap_event       redcap_form_instance redcap_event_instance repeat_1 form_status_complete
      <dbl> <chr>                             <dbl>                 <dbl> <chr>    <fct>               
1         1 non_repeating                         1                    NA Repeat 1 Incomplete          
2         1 non_repeating                         2                    NA Repeat 2 Incomplete          
3         1 repeating_together                   NA                     1 A        Incomplete          
4         1 repeating_together                   NA                     2 B        Incomplete          

[[3]]
# A tibble: 5 × 6
  record_id redcap_event       redcap_form_instance redcap_event_instance mixed_structure_1 form_status_complete
      <dbl> <chr>                             <dbl>                 <dbl> <chr>             <fct>               
1         1 non_repeating                         1                    NA Mixed Nonrepeat 1 Incomplete          
2         1 repeating_separate                    1                    NA Mixed Repeat 1    Incomplete          
3         1 repeating_separate                    2                    NA Mixed Repeat 2    Incomplete          
4         1 repeating_together                   NA                     1 A                 Incomplete          
5         1 repeating_together                   NA                     2 B                 Incomplete        

Proposed Changes

List changes below in bullet format:

  • Update add_event_mapping() with output from new function get_repeat_event_types() and make repeat_type available in the redcap_events column of the supertibble
  • Update clean_redcap_()
    • Update convert_mixed_instrument() to detect RT data and shift instances over to redcap_event_instance
  • Update add_partial_keys() to handle regex for arms that don't end in an integer (See #206)
  • Update test suite accordingly
  • Add new join_data_tibbles() function and tests

Remaining TODO's

  • Update NEWS.md
  • Update pkgdown vignette
  • Finish updating join_data_tibbles() to handle by appropriately for RT/RS data
    • See comment in #204 about concerns

Issue Addressed

Closes #206 Closes #204 Closes #199

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • [ ] New/revised functions have associated tests
  • [ ] New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • [ ] New/revised functions use appropriate naming conventions
  • [ ] New/revised functions don't repeat code
  • [ ] Code changes are less than 250 lines total
  • [ ] Issues linked to the PR using GitHub's list of keywords
  • [ ] The appropriate reviewer is assigned to the PR
  • [ ] The appropriate developers are assigned to the PR
  • [ ] Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • [ ] I checked that new files follow naming conventions and are in the right place
  • [ ] I checked that documentation is complete, clear, and without typos
  • [ ] I added/edited comments to explain "why" not "how"
  • [ ] I checked that all new variable and function names follow naming conventions
  • [ ] I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • [ ] I checked that new tests address important edge cases

  • To see the specific tasks where the Asana app for GitHub is being used, see below:
    • https://app.asana.com/0/0/1207922156958425

rsh52 avatar Sep 20 '24 18:09 rsh52