fhir-data-pipes
fhir-data-pipes copied to clipboard
Update documentation to clarify feature roadmap (e.g. what is ready, what is not yet ready).
For example, I think we want to indicate that we want to enable support of single-machine deployment using FHIR search, instead of the native HAPI integration - but that it is not yet ready.
My understanding of current features:
Pipeline
Transform data from a FHIR-based data source to Parquet files or a different FHIR store.
- Batch Mode
- FHIR Search-based source
- Generic FHIR, HAPI, OpenMRS
- JDBC-based source
- HAPI, OpenMRS
- FHIR Search-based source
- Streaming Mode
- Source: OpenMRS
- Output options
- Parquet files
- FHIR API sink
- Generic FHIR
- JDBC sink [deprioritized?]
- HAPI only
Controller
Schedule incremental runs for a Pipeline to Parquet files. Provides a GUI to run Pipeline, see its status, and see its settings.
- Input - JDBC-based HAPI FHIR server
- Output - Parquet files
Single Machine Deployment
A single Docker Compose configuration to run Controller and query the Parquet files via a Spark thrift server.
- Input - JDBC-based HAPI FHIR server
- Output - Spark thrift server at port jdbc:hive2://server:10001
Test docker images
Docker images to try out Pipelines with different sources and sinks.
- Input
- HAPI FHIR server configured with a Postgres server; no preloaded data
- OpenMRS server configured with MySQL
- Sink
- HAPI FHIR server
- DWH
- OpenHIM
- Indicator Calc
Synthea HIV
Generate synthetic HIV patient data using Synthea. Upload data to HAPI, GCP, or OpenMRS. Use pre-generated test data in your own development.
- Generator
- Input: # patients
- Output: 1 FHIR Bundle per generated patient history
- Uploader
- HAPI
- GCP
- OpenMRS
- Sample Data
- 79 patients, 4006 Encounters, and 17279 Observations @ ~100MB
DWH Query Library [deprioritized?]
Simplifies querying of FHIR-based data warehouses by providing a unified query API across Spark and BigQuery.
This is not a requirement for Beta launch so moving to "post-beta" milestone.