fhir-data-pipes icon indicating copy to clipboard operation
fhir-data-pipes copied to clipboard

As a developer, I am confident that the FHIR data pipes export can scale and understand what configurations are appropriate

Open jjtswan opened this issue 1 year ago • 4 comments

Goals:

Must Have, Benchmark the components having them deployed in a single machine

  • [ ] Set performance targets
  • [ ] Run loadtests against those targets
  • [ ] If not sufficient, identify fixes to address performance problems
  • [ ] Update documentation to include benchmark results
  • [ ] Ideally have benchmarks that can be continued to run by future developers.

Nice To Have Deploy the pipeline into Dataflow and repeat the benchmarks for different data sets

Also see #573

jjtswan avatar Mar 06 '23 20:03 jjtswan

For reference: We have done #266 before and its findings are well documented here. A big part of the performance question for the pipelines is about the throughput of the source FHIR-store/DB. So it would be nice if we clearly define the scope of this issue because exploring various configurations is a very large task.

We can evaluate replications and cloud deployments but I don't think those are necessary for the beta launch.

The main extra piece that is needed for beta launch (IMHO) is the performance of the merger pipeline.

bashir2 avatar Mar 08 '23 05:03 bashir2

Merger pipeline - you mean for the incremental updates for micro-batch?

jjtswan avatar Mar 14 '23 22:03 jjtswan

Merger pipeline - you mean for the incremental updates for micro-batch?

Yes for the incremental updates and to be more specific, I mean the merger pipeline as it reads the old DWH in its entirety and merges it with the new incremental updates. The GroupByKey can be expensive.

bashir2 avatar Mar 16 '23 23:03 bashir2

@chandrashekar-s please merge this issue and #648; these should include a short doc about resource recommendation as a function of data size.

bashir2 avatar May 02 '23 15:05 bashir2