rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Skip swaths of range tombstone covered keys in merging iterator (2022 edition)

Open cbi42 opened this issue 3 years ago • 14 comments

Summary: With the invariant that a key in level L (consider memtable as the first level, each immutable and L0 as a separate level) has a larger sequence number than all keys in any level >L, a range tombstone [start, end) from level L covers all keys in its range in any level >L. This property motivates optimizations in iterator:

  • in Seek(target), if level L has a range tombstone [start, end) that covers target.UserKey, then for all levels > L, we can do the initial Seek on end instead of target to skip some range tombstone covered keys.
  • in Next()/Prev(), if the current key is covered by a range tombstone [start, end) from level L, we can Seek to the end for all levels > L.

This PR implements the above optimizations in MergingIterator. As all range tombstone covered keys are now skipped in MergingIterator, the range tombstone logic is removed from DBIter. The idea in this PR is similar to #7317, but this PR leaves InternalIterator interface unchanged. Credit: the cascading seek optimization and the sentinel key (discussed below) are inspired by Pebble and suggested by @ajkr in #7317.

The two optimizations are mostly implemented in SeekImpl()/SeekForPrevImpl() and IsNextDeleted()/IsPrevDeleted() in merging_iterator.cc. See comments for each method for more detail.

One major change, sentinel range tombstone key, is made to LevelIterator. Before this PR, when all point keys in an SST file are iterated through in MergingIterator, a level iterator would advance to the next SST file in its level. In the case when an SST file has a range tombstone that covers keys beyond the SST file's last point key, advancing to the next SST file would lose this range tombstone. Consequently, MergingIterator could return keys that should have been deleted by some range tombstone. We prevent this by pretending that file boundaries in each SST file are sentinel keys, when the boundaries are defined by some range tombstone. A LevelIterator now only advance the file iterator once the sentinel key is processed.

Test plan:

  • Added many unit tests in db_range_del_test
  • Stress test: ./db_stress --readpercent=5 --prefixpercent=19 --writepercent=20 -delpercent=10 --iterpercent=44 --delrangepercent=2
  • Performance benchmark: I used a similar setup as in the blog post that introduced DeleteRange, "a database with 5 million data keys, and 10000 range tombstones (ignoring those dropped during compaction) that were written in regular intervals after 4.5 million data keys were written". As expected, the performance with this PR depends on the range tombstone width.
# Setup: 
TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=fillrandom --writes=4500000 --num=5000000
TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=overwrite --writes=500000 --num=5000000 --use_existing_db=true --writes_per_range_tombstone=50

# Scan entire DB
TEST_TMPDIR=/dev/shm ./db_bench_main --benchmarks=readseq[-X5] --use_existing_db=true --num=5000000 --disable_auto_compactions=true

# Short range scan (10 Next())
TEST_TMPDIR=/dev/shm/width-100/ ./db_bench_main --benchmarks=seekrandom[-X5] --use_existing_db=true --num=500000 --reads=100000 --seek_nexts=10 --disable_auto_compactions=true

# Long range scan(1000 Next())
TEST_TMPDIR=/dev/shm/width-100/ ./db_bench_main --benchmarks=seekrandom[-X5] --use_existing_db=true --num=500000 --reads=2500 --seek_nexts=1000 --disable_auto_compactions=true

Avg over of 5 runs (some slower tests had fews runs):

For the first column (tombstone), 0 means no range tombstone, 100-10000 means width of the 10k range tombstones, and 1 means there is a single range tombstone in the entire DB (width is 1000). The 1 tombstone case is to test regression when there's very few range tombstones in the DB, as no range tombstone is likely to take a different code path than with range tombstones.

  • Scan entire DB
tombstone Pre-PR ops/sec Post-PR ops/sec ±%
0 2522114 (± 54748) 2373930 (± 57270) -5.88%
100 1831801 (± 35137) 1853007 (± 23386) +1.16%
1000 431380 (± 5523) 1144335 (± 18289) +165.27%
10000 23171 (± 430) 233767 (± 9013) +908.88%
1 2448520 (± 41681) 2295892 (± 54856) -6.23%
  • Short range scan
tombstone width Pre-PR ops/sec Post-PR ops/sec ±%
0 36196 (± 848) 35489 (± 603) -1.95%
100 28295 (± 534) 29804 (± 567) +5.33%
1000 8252 (± 150) 24998 (± 222) +202.93%
10000 1289 (± 5) 30722 (± 287) +2283.4%
1 34786 (± 476) 33114 (± 395) -4.81%
  • Long range scan
tombstone width Pre-PR ops/sec Post-PR ops/sec ±%
0 2282 (± 38) 2266 (± 51) -0.7%
100 1631 (± 64) 1668 (± 41) +2.27%
1000 382 (± 16) 1100 (± 18) +187.96%
10000 24 (± 0) 658 (± 21) +2641.67%
1 2113 (± 49) 2026 (± 52) -4.12%
  • TODO: this change touches critical iterator code, so I want to think of more stress tests to do, and consider iterator lower/upper bound, prefix scan, and user defined timestamp.

cbi42 avatar Jul 31 '22 20:07 cbi42

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 05 '22 03:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 05 '22 04:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 05 '22 04:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 05 '22 22:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 05 '22 22:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 05 '22 22:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 06 '22 05:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 06 '22 19:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 08 '22 00:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 10 '22 18:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 10 '22 19:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 10 '22 19:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 11 '22 06:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 11 '22 06:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 16 '22 08:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 16 '22 17:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 16 '22 18:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 19 '22 18:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 21 '22 19:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 21 '22 20:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 21 '22 20:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 21 '22 20:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 21 '22 20:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 23 '22 23:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 24 '22 00:08 facebook-github-bot

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 24 '22 19:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 27 '22 02:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 27 '22 02:08 facebook-github-bot

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Aug 30 '22 03:08 facebook-github-bot

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 30 '22 04:08 facebook-github-bot