fhir-data-pipes
fhir-data-pipes copied to clipboard
ViewDefinition-to-Parquet Implementation
Description of what I changed
Implemented ViewDefinition-to-Parquet as planned in Issue#1118.
The current implementation allows users to apply ViewDefinitions of their choosing to a list of resources, the outputted Parquet files represent these materialized views. In addition, the materialized views can be written to Parquet in addition to other features of the pipeline, such as creating views in a sink database.
The first commit implements an experimental version of this feature that required the creation of a new DoFn in the class FetchRecords similar to FetchSearchPageFn.
The more recent commit refactors the pipeline code to use ParquetUtil instead of Apache Beam's ParquetIO to write Parquet files.
E2E test
TESTED:
Added unit tests for FHIR Type to Avro Schema conversion, with multiple resource types. Did a full successful pipeline run and inspected output Parquet files.
Added unit test for ParquetUtil and the new functionality. Tested output naming convention for materialized ViewDefinitions in Parquet.
Checklist: I completed these to help reviewers :)
-
[x] I have read and will follow the review process.
-
[x] I am familiar with Google Style Guides for the language I have coded in.
No? Please take some time and review Java and Python style guides.
-
[x] My IDE is configured to follow the Google code styles.
No? Unsure? -> configure your IDE.
-
[x] I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)
-
[x] I ran
mvn clean packageright before creating this pull request and added all formatting changes to my commit. -
[x] All new and existing tests passed.
-
[x] My pull request is based on the latest changes of the master branch.
No? Unsure? -> execute command
git pull --rebase upstream master
Codecov Report
Attention: Patch coverage is 68.45638% with 47 lines in your changes missing coverage. Please review.
Project coverage is 52.59%. Comparing base (
b03c2c7) to head (6d0ebbb).
Additional details and impacted files
@@ Coverage Diff @@
## master #1130 +/- ##
============================================
+ Coverage 51.76% 52.59% +0.83%
- Complexity 669 706 +37
============================================
Files 95 95
Lines 5612 5751 +139
Branches 731 765 +34
============================================
+ Hits 2905 3025 +120
- Misses 2425 2434 +9
- Partials 282 292 +10
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.