fhir-data-pipes icon indicating copy to clipboard operation
fhir-data-pipes copied to clipboard

Create a realistic test env to show scaling capabilities of the pipeline

Open bashir2 opened this issue 11 months ago • 1 comments

The main reason that our pipelines are implemented using Apache Beam is to make sure they are horizontally scalable and able to process large input FHIR data in a short time. We have shown this scalability feature with JSON input files (on a distributed file system) but a more realistic scenario is to have a FHIR server backed by a database with multiple replicas. This issue is to create and test the following two scenarios:

  • A HAPI FHIR server with large amount of data being queried through the search API.
  • Same as above but through the direct DB access mode.

The data for above cases can come from the Synthea-HIV module. The test env. should be easy/quick to deploy; i.e., we should save the DB snapshot such that it can quickly be deployed whenever needed. We will run the pipelines on Dataflow service of Google Cloud and the DB should be on Cloud SQL (with enough replicas enabled). So part of this issue is to create a test env on GCP with a replicated HAPI server and DB replicas backing it.

This can also be used as a test bed for the Bulk Export API once we are done with its implementation (#533 is related).

bashir2 avatar Feb 26 '24 18:02 bashir2

I will take a look

jakubadamek avatar Feb 27 '24 08:02 jakubadamek