cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

cmd/roachtest: add new import-cancellation roachtest

Open jbowens opened this issue 2 years ago • 8 comments

Add a new roachtest that stresses IMPORT cancellation with MVCC Range tombstones enabled. The IMPORTs use subsets of the total available files in order to produce varied mvcc range tombstone bounds. After the repeated IMPORT cancellation, once all tables are successfully imported, the raochtest runs tpch queries against the imported tables.

Release justification: Non-production code changes Release note: none

jbowens avatar Sep 01 '22 20:09 jbowens

This change is Reviewable

cockroach-teamcity avatar Sep 01 '22 20:09 cockroach-teamcity

I'd like to iterate on this in a followup to ensure that the range keys are eventually GC'd and dropped from the LSM. We may want to backport a simple compaction heuristic to ensure this happens relatively promptly.

Makes sense. @aliher1911 is working on faster MVCC GC.

erikgrinaker avatar Sep 12 '22 13:09 erikgrinaker

Sorry, I've been working on personal mac for open source work for better ergonomics, and I keep pushing stale branches. Pushed the updated branch.

I see, thanks. Looked it over, no new comments.

erikgrinaker avatar Sep 12 '22 15:09 erikgrinaker

From an example run, still in the IMPORT phase:

Screen Shot 2022-09-21 at 3 57 51 PM Screen Shot 2022-09-21 at 3 57 57 PM

jbowens avatar Sep 21 '22 19:09 jbowens

The wedged IMPORT jobs in this run eventually failed with:

addsstable [/Table/106/1/4574485/0,/Table/106/1/4702132/0/NULL): batch timestamp 1663791548.214654642,0 must be after replica GC threshold 1663793376.235269131,0

I'll do some more investigation into this CheckSSTConflicts slowdown.

jbowens avatar Sep 21 '22 21:09 jbowens

Maybe this test is too aggressive with the import cancellations. After a while, the imports slowed to a crawl. There's barely any CPU utilization, but it's all in CheckSSTConflicts.

I'd be curious to compare this with 22.1. CheckSSTConflicts is known to severely hamper import performance, sometimes pathologically, but as long as we're no worse than 22.1 then I suppose it's "fine" for our purposes.

erikgrinaker avatar Sep 22 '22 09:09 erikgrinaker

I'd be curious to compare this with 22.1. CheckSSTConflicts is known to severely hamper import performance, sometimes pathologically, but as long as we're no worse than 22.1 then I suppose it's "fine" for our purposes.

I think this is new, specifically to import cancellation. Previously CheckSSTConflicts significantly hampered import performance but mostly for unordered imports where there was always a relatively dense engine keyspace overlapping the AddSSTable sstable. CheckSSTConflicts had less of an impact on ordered imports with few engine keys in the same keyspace. However, history-preserving import cancellation results in an even denser engine keyspace overlapping the AddSSTable sstable on a retry.

jbowens avatar Sep 22 '22 14:09 jbowens

Right. I notice that we don't use range key masking in CheckSSTConflicts. I think we could, because the range tombstone would necessarily have to be above any point keys (so we couldn't write below it anyway), the range tombstone has already accounted for the MVCC stats of the covered data, and the disallowShadowing options don't care about tombstones anyway. Perhaps the only case would be where we allow writing at historical timestamps and want the writes to be idempotent, but I don't think we ever do in practice, and we could guard against it.

Want to give this a try with threading through the request timestamp when SSTTimestampToRequestTimestamp is set and using it for RangeKeyMasking?

erikgrinaker avatar Sep 22 '22 15:09 erikgrinaker

Btw, it would be good to assert MVCC stats here too. We could do something like the clearrange roachtest:

https://github.com/cockroachdb/cockroach/blob/aac830251139eca515ae31053414bb7ab624a115/pkg/cmd/roachtest/tests/clearrange.go#L82

But since this only runs the check on splits/merges and replication, we could also consider just running a consistency check at the end with crdb_internal.check_consistency(), which will also assert the stats.

erikgrinaker avatar Sep 23 '22 09:09 erikgrinaker

tftr!

bors r=nicktrav

jbowens avatar Oct 20 '22 17:10 jbowens

Build succeeded:

craig[bot] avatar Oct 20 '22 18:10 craig[bot]