REDCapTidieR
REDCapTidieR copied to clipboard
join_data_tibbles, Enable Repeat Event Type Supertibble Definion
Description
This PR seeks to provide a new join_data_tibbles()
function. To support this with regards to complex mixed data structures, some changes needed to be made to how we process longitudinal databases.
This also addresses a few bugs, one minor (documented below), and one major where we are mistakenly reporting mixed data structure outputs.
I will be posting comments on the code itself below to explain some of the changes.
The Missing Data Issue
Because we weren't making use of redcap_event_instance
correctly, we were not accurately reporting all of the data associated with each REDCap event. See below from the Mixed Structure REDCap for reference:
And here is what currently comes out of the REDCapTidieR data tibbles from main
:
Current Output with Missing Data
> sprtbl_mixed
# A REDCapTidieR Supertibble with 3 instruments
redcap_form_name redcap_form_label redcap_data redcap_metadata redcap_events structure data_rows data_cols
<chr> <chr> <list> <list> <list> <chr> <int> <int>
1 nonrepeat_form Nonrepeat Form <tibble> <tibble [2 × 17]> <tibble> nonrepea… 4 5
2 repeat_form Repeat Form <tibble> <tibble [2 × 17]> <tibble> mixed 2 5
3 mixed_structure_form Mixed Structure Form <tibble> <tibble [2 × 17]> <tibble> mixed 3 5
# ℹ 3 more variables: data_size <lbstr_by>, data_na_pct <formttbl>, form_complete_pct <formttbl>
> sprtbl_mixed$redcap_data
[[1]]
# A tibble: 4 × 5
record_id redcap_event redcap_event_instance nonrepeat_1 form_status_complete
<dbl> <chr> <dbl> <chr> <fct>
1 1 non_repeating NA Nonrepeat 1 Incomplete
2 1 repeating_separate NA Nonrepeat 2 Incomplete
3 1 repeating_together 1 A Incomplete
4 1 repeating_together 2 B Incomplete
[[2]]
# A tibble: 2 × 5
record_id redcap_event redcap_form_instance repeat_1 form_status_complete
<dbl> <chr> <dbl> <chr> <fct>
1 1 non_repeating 1 Repeat 1 Incomplete
2 1 non_repeating 2 Repeat 2 Incomplete
[[3]]
# A tibble: 3 × 5
record_id redcap_event redcap_form_instance mixed_structure_1 form_status_complete
<dbl> <chr> <dbl> <chr> <fct>
1 1 non_repeating 1 Mixed Nonrepeat 1 Incomplete
2 1 repeating_separate 1 Mixed Repeat 1 Incomplete
3 1 repeating_separate 2 Mixed Repeat 2 Incomplete
In the second data tibble (the repeat form), we should expect to see 4 entries, 2 for the non repeating event due to 2 repeating form instances, and 2 for the repeating together event entries.
In the third data tibble (the mixed structure form), we should expect to see 5 entries and are again missing the 2 repeating together event entries.
Here is the output with the proposed code changes:
Proposed Output with Missing Data Added
> sprtbl_mixed
# A REDCapTidieR Supertibble with 3 instruments
redcap_form_name redcap_form_label redcap_data redcap_metadata redcap_events structure data_rows data_cols data_size data_na_pct form_complete_pct
<chr> <chr> <list> <list> <list> <chr> <int> <int> <lbstr_b> <formttbl> <formttbl>
1 nonrepeat_form Nonrepeat Form <tibble> <tibble> <tibble> nonrepea… 4 5 2.59 kB 0% 0%
2 repeat_form Repeat Form <tibble> <tibble> <tibble> mixed 4 6 2.67 kB 0% 0%
3 mixed_structure_f… Mixed Structure … <tibble> <tibble> <tibble> mixed 5 6 2.94 kB 0% 0%
> sprtbl_mixed$redcap_data
[[1]]
# A tibble: 4 × 5
record_id redcap_event redcap_event_instance nonrepeat_1 form_status_complete
<dbl> <chr> <dbl> <chr> <fct>
1 1 non_repeating NA Nonrepeat 1 Incomplete
2 1 repeating_separate NA Nonrepeat 2 Incomplete
3 1 repeating_together 1 A Incomplete
4 1 repeating_together 2 B Incomplete
[[2]]
# A tibble: 4 × 6
record_id redcap_event redcap_form_instance redcap_event_instance repeat_1 form_status_complete
<dbl> <chr> <dbl> <dbl> <chr> <fct>
1 1 non_repeating 1 NA Repeat 1 Incomplete
2 1 non_repeating 2 NA Repeat 2 Incomplete
3 1 repeating_together NA 1 A Incomplete
4 1 repeating_together NA 2 B Incomplete
[[3]]
# A tibble: 5 × 6
record_id redcap_event redcap_form_instance redcap_event_instance mixed_structure_1 form_status_complete
<dbl> <chr> <dbl> <dbl> <chr> <fct>
1 1 non_repeating 1 NA Mixed Nonrepeat 1 Incomplete
2 1 repeating_separate 1 NA Mixed Repeat 1 Incomplete
3 1 repeating_separate 2 NA Mixed Repeat 2 Incomplete
4 1 repeating_together NA 1 A Incomplete
5 1 repeating_together NA 2 B Incomplete
Proposed Changes
List changes below in bullet format:
- Update
add_event_mapping()
with output from new functionget_repeat_event_types()
and makerepeat_type
available in theredcap_events
column of the supertibble - Update
clean_redcap_()
- Update
convert_mixed_instrument()
to detect RT data and shift instances over toredcap_event_instance
- Update
- Update
add_partial_keys()
to handle regex for arms that don't end in an integer (See #206) - Update test suite accordingly
- Add new
join_data_tibbles()
function and tests
Remaining TODO's
- Update NEWS.md
- Update pkgdown vignette
- Finish updating
join_data_tibbles()
to handleby
appropriately for RT/RS data- See comment in #204 about concerns
Issue Addressed
Closes #206 Closes #204 Closes #199
PR Checklist
Before submitting this PR, please check and verify below that the submission meets the below criteria:
- [ ] New/revised functions have associated tests
- [ ] New/revised functions that update downstream outputs have associated static testing files (
.RDS
) updated underinst/testdata/create_test_data.R
- [ ] New/revised functions use appropriate naming conventions
- [ ] New/revised functions don't repeat code
- [ ] Code changes are less than 250 lines total
- [ ] Issues linked to the PR using GitHub's list of keywords
- [ ] The appropriate reviewer is assigned to the PR
- [ ] The appropriate developers are assigned to the PR
- [ ] Pre-release package version incremented using
usethis::use_version()
Code Review
This section to be used by the reviewer and developers during Code Review after PR submission
Code Review Checklist
- [ ] I checked that new files follow naming conventions and are in the right place
- [ ] I checked that documentation is complete, clear, and without typos
- [ ] I added/edited comments to explain "why" not "how"
- [ ] I checked that all new variable and function names follow naming conventions
- [ ] I checked that new tests have been written for key business logic and/or bugs that this PR fixes
- [ ] I checked that new tests address important edge cases
- To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1207922156958425