rocksdb
rocksdb copied to clipboard
Skip swaths of keys covered by range tombstones (2020 edition)
Similar to #5506, but moved range tombstone aware logic from BlockBasedTableIterator
into MergingIterator
since RangeDelAggregator
has the same scope as MergingIterator
. Now the optimized seek only happens during forward/backward scan, not during any user seek. Besides that, there are a few minor fixes/improvements:
- Fixed a bug where
GetEndpoint()
was getting its endpoint from theactive_seqnums_
heap. While it's an endpoint representing a valid tombstone, it's better to get the endpoint from theactive_iters_
heap in case a tombstone with a newer seqno shows up before the current newest one ends. - Added
PerfContext::internal_range_del_reseek_count
to count how many times optimized seek happened. - Got rid of endpoint caching in the iterator. It risked missing seek opportunities when files in higher levels would be opened and covering tombstones would be added, but the table iterator wouldn't notice because it had cached some distant endpoint.
But, this has not been heavily tested or benchmarked. One TODO is reduce calls to the optimized seek functions -- we should only call them upon noticing the endpoint has changed.