dolt
dolt copied to clipboard
System Table Performance Investigation
Dolt supports a wide variety of system tables that offer version control features via the SQL interface. These are tables that are queried consistently but may or may not have good performance. We need benchmarks and queries that allow us to track system table performance over time.
Diffs
We can start with measuring performance for the dolt_diff
and dolt_commit_diff
table by constructing a benchmarking database with a large amounts of commits and large amount of update tables.
The independent variable we want to track in our database are:
- The number of commits spanned between queries.
- The number of edits per commit. Each commit can be modeled as a series randomized Delete, Update, or Insert operations. Our benchmarking data can have branches with various edit sizes (i.e some proportion of initial row count).
- The complexity of the schema. We can stick to the sysbench schema for now.
The dependent queries we want to track:
- The time it takes to compute a
SELECT * from dolt_diff_TABLENAME
end to end and the average compute time per commit. - The time it takes to compute a
SELECT * dolt_commit_diff
end to end and the average computer time per commit. - The time it takes to a compute a
SELECT * FROM dolt_history_TABLENAME
end to end and the average compute time per commit.
This issue will get updated over time as we profile more and more system tables.
Generally I like the design of these benchmarks. Couple of comments:
- Do we think table size is relevant here? or should that be held constant.
- Rather than use branches to organize edit sizes, I think it would be conceptually easier to have a single linear history and use multiple tables per commit to test different edit sizes. We could then use branches or tags at fixed points in the history to vary the commit depth of a test.
I'm closing this. I don't think it makes sense as an issue. We need to run a system table performance project of which benchmarking is the first step.