loadgen
loadgen copied to clipboard
tpcc: investigate adding a primary key for the history table
The history
table doesn't have a primary key which means Cockroach autogenerates a primary key with a unique_rowid()
function. This creates a hotspot as every insertion into the history
table will be touch the last range of that table (unique_rowid()
produces IDs that are usually sorted). We should investigate adding a UUID
primary key for this table.
Another alternative is to manually specify the hidden rowid
column in the insert. We could likely get away with rand.Int63()
. This would obviate the need to regenerate all of the history
table data in fixtures.
Alternately, we could make every field in the history
table (or enough fields for uniqueness) a part of its primary key to slightly reduce the amount of data written (or maybe not - the index encoding of the data might be less space-efficient than the value encoding and negate the benefit of eliminating the rowid
column) and let us sort the data by any of the columns.
It seems like an oversight in the design of TPCC that they didn't include any queries to the history table. I'd expect any real-world history table to have a time-ordered index, which would make it more complicated to gain parallelism here.