fhir-data-pipes
fhir-data-pipes copied to clipboard
Create a realistic test env to show scaling capabilities of the pipeline
The main reason that our pipelines are implemented using Apache Beam is to make sure they are horizontally scalable and able to process large input FHIR data in a short time. We have shown this scalability feature with JSON input files (on a distributed file system) but a more realistic scenario is to have a FHIR server backed by a database with multiple replicas. This issue is to create and test the following two scenarios:
- A HAPI FHIR server with large amount of data being queried through the search API.
- Same as above but through the direct DB access mode.
The data for above cases can come from the Synthea-HIV module. The test env. should be easy/quick to deploy; i.e., we should save the DB snapshot such that it can quickly be deployed whenever needed. We will run the pipelines on Dataflow service of Google Cloud and the DB should be on Cloud SQL (with enough replicas enabled). So part of this issue is to create a test env on GCP with a replicated HAPI server and DB replicas backing it.
This can also be used as a test bed for the Bulk Export API once we are done with its implementation (#533 is related).
I will take a look