fhir-data-pipes Compare Spark+Parquet with PostgreSQL/Alloy DB view based approaches

Compare Spark+Parquet with PostgreSQL/Alloy DB view based approaches

Open bashir2 opened this issue 11 months ago • 2 comments

Now that our support for SQL-on-FHIR-v2 ViewDefinition is complete (#821 and #916) we should do some large scale comparisons of the Spark+Parquet based approach with relational DB based ones using materialized views. We can start with PostgreSQL and establish some guidelines on the scale of data at which using a single node PG DB starts to make less sense (compared to a multi node Spark+Parquet based approach). Then we should repeat the same experiment with AlloyDB to see the impact of columnar storage (while still single node).

We should do several experiments using multiple realistic workloads (e.g., calculating program or data quality metrics involving joins of multiple resource tables). But we also recognize that these comparisons will always be subjective to some extent, because of the choice of workloads.

Feb 26 '24 18:02 bashir2

I might be interested as a sequel to #967

Feb 27 '24 08:02 jakubadamek

I might be interested as a sequel to #967

That would be great; please feel free to assign this to yourself once you start working on it.

Feb 29 '24 15:02 bashir2

fhir-data-pipes fhir-data-pipes copied to clipboard

Compare Spark+Parquet with PostgreSQL/Alloy DB view based approaches

fhir-data-pipes
fhir-data-pipes copied to clipboard