rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Skip swaths of keys covered by range tombstones (2020 edition)

Open ajkr opened this issue 4 years ago • 0 comments

Similar to #5506, but moved range tombstone aware logic from BlockBasedTableIterator into MergingIterator since RangeDelAggregator has the same scope as MergingIterator. Now the optimized seek only happens during forward/backward scan, not during any user seek. Besides that, there are a few minor fixes/improvements:

  • Fixed a bug where GetEndpoint() was getting its endpoint from the active_seqnums_ heap. While it's an endpoint representing a valid tombstone, it's better to get the endpoint from the active_iters_ heap in case a tombstone with a newer seqno shows up before the current newest one ends.
  • Added PerfContext::internal_range_del_reseek_count to count how many times optimized seek happened.
  • Got rid of endpoint caching in the iterator. It risked missing seek opportunities when files in higher levels would be opened and covering tombstones would be added, but the table iterator wouldn't notice because it had cached some distant endpoint.

But, this has not been heavily tested or benchmarked. One TODO is reduce calls to the optimized seek functions -- we should only call them upon noticing the endpoint has changed.

ajkr avatar Aug 27 '20 01:08 ajkr