Benchmarks
Description
Add/refactor benchmarks. We currently have a suite of benchmarks in bench_runner, but these might not have the features we need/want to meet our goals (below).
Most benchmarks should be done in client/server mode to avoid hybrid exec muddying up the numbers. But we should have something to know what the overhead is when running w/ hybrid exec.
Goals
- Numbers to showcase performance on glaredb cloud (and cost). TPC-H sf=10 and sf=100 are good starting points. If feasible, I'd like to see if we could also get sf=1000
- Have a baseline metric to help us understand what distributed exec gets us. Does a 4-node compute engine translate to 4x perf increase?
- Understand how performance changes over time. E.g. what's the performance difference between two releases.
- It might make sense to have Cloud run a suite of benchmarks against a deployment on every merge to main.
just adding a few perf observations here.
using the https://datasets.clickhouse.com/hits_compatible/hits.parquet dataset.
Datafusion only takes ~74ms while we take ~300ms
> timeit {datafusion-cli -q -c "select count(*) from './hits.parquet'"}
74ms 955µs 166ns
> timeit {./target/release/glaredb -q "select count(*) from './hits.parquet'"}
296ms 460µs
I can't imagine we have that much overhead over datafusion that would warrant such a drastic performance hit
This one just never finishes on datafusion-cli but takes ~45sec locally. Polars only takes 100-500ms (76-400X) faster (polars is only perceived faster because of some implicit limits)
> timeit {./target/release/glaredb -q "select * from parquet_scan('./hits.parquet');" }
45sec 133ms 464µs
Here are some of my thoughts:
- We should run benchmarks using a "local" instance (we can use python bindings here). We can optionally connect the same to a cloud compute to calculate/analyze the cost of hybrid execution. Reasoning here being: We can quickly eliminate any postgres and network overhead when calculating and this gives us comparable metrics to say DuckDB.
- We can run local benchmarks against each commit (on main) in the CI itself (using a dedicated machine). For each release we can have the benchmarks run against a cloud instance (hybrid and server) for some real benchmarks.
- We can modify the runtime threads (tokio) to analyze what distributed execution might get us. It's not an accurate measurement of the number by which we'll get a performance boost but does give us some insight. For eg. we can run the benchmarks for 1 vs 2 threads to analyze this. Still thinking about this though.
Benchmarking Framework
Goals
- TPC-H benchmarks: We want to run the TPC-H queries (and more) against a given dataset (scale factors, etc).
- Different runners: We want to be able to run the benchmarks against:
- Local instance
- Postgres server (
pgsrv) - RPC server (
rpcsrv)
- Checkpoints: We want to be able to checkpoint the results (manually) to compare the results (change in performance).
Plan
- We’ll have a script to download generated datasets (much simpler to already have the required data for access in GCS buckets).
- We need different runner implementations.
- The one with local instance can be run on each PR commit.
- On each merge in
mainwe can run the benchmarks against the cloud server.
- We can directly use the runner to save/update benchmark data locally or on cloud (thanks to
COPY TO):- Need a defined schema for storing
- We can use GlareDB to analyse/diff benchmarks (from previous checkpoints)
- Two checkpoints shall always exist —
previousandcurrent.
Moving from 0.6.0 https://github.com/GlareDB/glaredb/milestone/25
current state: benchmarks fail in CI despite running locally; though they are flaky.
For running with larger scale we'd need a custom runner. @universalmind303 can you take a look this week and see if there's something we can do to increase reliability/decrease failure incidence? [timebox make sure that there's nothing obvious before increasing runner size]