fhir-data-pipes icon indicating copy to clipboard operation
fhir-data-pipes copied to clipboard

ViewDefinition-to-Parquet Implementation

Open itsiggs opened this issue 1 year ago • 1 comments

Description of what I changed

Implemented ViewDefinition-to-Parquet as planned in Issue#1118.

The current implementation allows users to apply ViewDefinitions of their choosing to a list of resources, the outputted Parquet files represent these materialized views. In addition, the materialized views can be written to Parquet in addition to other features of the pipeline, such as creating views in a sink database.

The first commit implements an experimental version of this feature that required the creation of a new DoFn in the class FetchRecords similar to FetchSearchPageFn.

The more recent commit refactors the pipeline code to use ParquetUtil instead of Apache Beam's ParquetIO to write Parquet files.

E2E test

TESTED:

Added unit tests for FHIR Type to Avro Schema conversion, with multiple resource types. Did a full successful pipeline run and inspected output Parquet files.

Added unit test for ParquetUtil and the new functionality. Tested output naming convention for materialized ViewDefinitions in Parquet.

Checklist: I completed these to help reviewers :)

  • [x] I have read and will follow the review process.

  • [x] I am familiar with Google Style Guides for the language I have coded in.

    No? Please take some time and review Java and Python style guides.

  • [x] My IDE is configured to follow the Google code styles.

    No? Unsure? -> configure your IDE.

  • [x] I have added tests to cover my changes. (If you refactored existing code that was well tested you do not have to add tests)

  • [x] I ran mvn clean package right before creating this pull request and added all formatting changes to my commit.

  • [x] All new and existing tests passed.

  • [x] My pull request is based on the latest changes of the master branch.

    No? Unsure? -> execute command git pull --rebase upstream master

itsiggs avatar Jul 25 '24 20:07 itsiggs

Codecov Report

Attention: Patch coverage is 68.45638% with 47 lines in your changes missing coverage. Please review.

Project coverage is 52.59%. Comparing base (b03c2c7) to head (6d0ebbb).

Files Patch % Lines
...in/java/com/google/fhir/analytics/ParquetUtil.java 69.76% 20 Missing and 6 partials :warning:
...ava/com/google/fhir/analytics/view/ViewSchema.java 70.37% 11 Missing and 5 partials :warning:
...c/main/java/com/google/fhir/analytics/FhirEtl.java 0.00% 2 Missing :warning:
...java/com/google/fhir/analytics/DataProperties.java 33.33% 2 Missing :warning:
...java/com/google/fhir/analytics/FhirSearchUtil.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1130      +/-   ##
============================================
+ Coverage     51.76%   52.59%   +0.83%     
- Complexity      669      706      +37     
============================================
  Files            95       95              
  Lines          5612     5751     +139     
  Branches        731      765      +34     
============================================
+ Hits           2905     3025     +120     
- Misses         2425     2434       +9     
- Partials        282      292      +10     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jul 25 '24 20:07 codecov-commenter