cockroach
cockroach copied to clipboard
cmd/roachtest: add new import-cancellation roachtest
Add a new roachtest that stresses IMPORT cancellation with MVCC Range tombstones enabled. The IMPORTs use subsets of the total available files in order to produce varied mvcc range tombstone bounds. After the repeated IMPORT cancellation, once all tables are successfully imported, the raochtest runs tpch queries against the imported tables.
Release justification: Non-production code changes Release note: none
I'd like to iterate on this in a followup to ensure that the range keys are eventually GC'd and dropped from the LSM. We may want to backport a simple compaction heuristic to ensure this happens relatively promptly.
Makes sense. @aliher1911 is working on faster MVCC GC.
Sorry, I've been working on personal mac for open source work for better ergonomics, and I keep pushing stale branches. Pushed the updated branch.
I see, thanks. Looked it over, no new comments.
From an example run, still in the IMPORT phase:
data:image/s3,"s3://crabby-images/4f6de/4f6de6248ed3675c70d2c5e1f6a960176aea009c" alt="Screen Shot 2022-09-21 at 3 57 51 PM"
data:image/s3,"s3://crabby-images/f8954/f89546dc8bd08feece9f4a3ce8f3a9d504e0e88a" alt="Screen Shot 2022-09-21 at 3 57 57 PM"
The wedged IMPORT jobs in this run eventually failed with:
addsstable [/Table/106/1/4574485/0,/Table/106/1/4702132/0/NULL): batch timestamp 1663791548.214654642,0 must be after replica GC threshold 1663793376.235269131,0
I'll do some more investigation into this CheckSSTConflicts slowdown.
Maybe this test is too aggressive with the import cancellations. After a while, the imports slowed to a crawl. There's barely any CPU utilization, but it's all in CheckSSTConflicts.
I'd be curious to compare this with 22.1. CheckSSTConflicts is known to severely hamper import performance, sometimes pathologically, but as long as we're no worse than 22.1 then I suppose it's "fine" for our purposes.
I'd be curious to compare this with 22.1. CheckSSTConflicts is known to severely hamper import performance, sometimes pathologically, but as long as we're no worse than 22.1 then I suppose it's "fine" for our purposes.
I think this is new, specifically to import cancellation. Previously CheckSSTConflicts significantly hampered import performance but mostly for unordered imports where there was always a relatively dense engine keyspace overlapping the AddSSTable sstable. CheckSSTConflicts had less of an impact on ordered imports with few engine keys in the same keyspace. However, history-preserving import cancellation results in an even denser engine keyspace overlapping the AddSSTable sstable on a retry.
Right. I notice that we don't use range key masking in CheckSSTConflicts
. I think we could, because the range tombstone would necessarily have to be above any point keys (so we couldn't write below it anyway), the range tombstone has already accounted for the MVCC stats of the covered data, and the disallowShadowing
options don't care about tombstones anyway. Perhaps the only case would be where we allow writing at historical timestamps and want the writes to be idempotent, but I don't think we ever do in practice, and we could guard against it.
Want to give this a try with threading through the request timestamp when SSTTimestampToRequestTimestamp
is set and using it for RangeKeyMasking
?
Btw, it would be good to assert MVCC stats here too. We could do something like the clearrange roachtest:
https://github.com/cockroachdb/cockroach/blob/aac830251139eca515ae31053414bb7ab624a115/pkg/cmd/roachtest/tests/clearrange.go#L82
But since this only runs the check on splits/merges and replication, we could also consider just running a consistency check at the end with crdb_internal.check_consistency()
, which will also assert the stats.
tftr!
bors r=nicktrav