fhir-data-pipes
fhir-data-pipes copied to clipboard
As a developer, I am confident that the FHIR data pipes export can scale and understand what configurations are appropriate
Goals:
Must Have, Benchmark the components having them deployed in a single machine
- [ ] Set performance targets
- [ ] Run loadtests against those targets
- [ ] If not sufficient, identify fixes to address performance problems
- [ ] Update documentation to include benchmark results
- [ ] Ideally have benchmarks that can be continued to run by future developers.
Nice To Have Deploy the pipeline into Dataflow and repeat the benchmarks for different data sets
Also see #573
For reference: We have done #266 before and its findings are well documented here. A big part of the performance question for the pipelines is about the throughput of the source FHIR-store/DB. So it would be nice if we clearly define the scope of this issue because exploring various configurations is a very large task.
We can evaluate replications and cloud deployments but I don't think those are necessary for the beta launch.
The main extra piece that is needed for beta launch (IMHO) is the performance of the merger pipeline.
Merger pipeline - you mean for the incremental updates for micro-batch?
Merger pipeline - you mean for the incremental updates for micro-batch?
Yes for the incremental updates and to be more specific, I mean the merger pipeline as it reads the old DWH in its entirety and merges it with the new incremental updates. The GroupByKey
can be expensive.
@chandrashekar-s please merge this issue and #648; these should include a short doc about resource recommendation as a function of data size.