dolt System Table Performance Investigation

Dolt supports a wide variety of system tables that offer version control features via the SQL interface. These are tables that are queried consistently but may or may not have good performance. We need benchmarks and queries that allow us to track system table performance over time.

Diffs

We can start with measuring performance for the dolt_diff and dolt_commit_diff table by constructing a benchmarking database with a large amounts of commits and large amount of update tables.

The independent variable we want to track in our database are:

The number of commits spanned between queries.
The number of edits per commit. Each commit can be modeled as a series randomized Delete, Update, or Insert operations. Our benchmarking data can have branches with various edit sizes (i.e some proportion of initial row count).
The complexity of the schema. We can stick to the sysbench schema for now.

The dependent queries we want to track:

The time it takes to compute a SELECT * from dolt_diff_TABLENAME end to end and the average compute time per commit.
The time it takes to compute a SELECT * dolt_commit_diff end to end and the average computer time per commit.
The time it takes to a compute a SELECT * FROM dolt_history_TABLENAME end to end and the average compute time per commit.

This issue will get updated over time as we profile more and more system tables.

Aug 02 '22 19:08 VinaiRachakonda

Generally I like the design of these benchmarks. Couple of comments:

Do we think table size is relevant here? or should that be held constant.
Rather than use branches to organize edit sizes, I think it would be conceptually easier to have a single linear history and use multiple tables per commit to test different edit sizes. We could then use branches or tags at fixed points in the history to vary the commit depth of a test.

Aug 02 '22 20:08 andy-wm-arthur

I'm closing this. I don't think it makes sense as an issue. We need to run a system table performance project of which benchmarking is the first step.

Aug 31 '22 17:08 timsehn

dolt dolt copied to clipboard

System Table Performance Investigation

Diffs

dolt
dolt copied to clipboard