risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

`ConcatSstableIterator::seek()` and `ConcatSstableIterator::seek_idx()` do not work as intended.

Open ALeitert opened this issue 2 years ago • 0 comments

This was discovered by @zwang28.

Currently, these two functions do not seem to cause any problems, due to the way we use them. They may, however, do so in the future if we use their fully intended functionality.

Intended Behaviour

A ConcatSstableIterator is initialised with a list of SSTs and some key-range. With respect to these, ConcatSstableIterator::seek(key) is supposed to reset the iterator and seeks to the first position where the stored key >= key.

The intended way to achieve this, when calling ConcatSstableIterator::seek(key), is as follows:

  • seek(key) determines the SST and its index table_idx which can contain key (or the next larger KV-pair) and then calls seek_idx(table_idx, Some(key)).
  • seek_idx(table_idx, Some(key)) determines the block and its start_index in that table. The function then starts a new BlockStream and SstableStreamIterator which start at that block.

Problems

There are three problems with the current implementation

  1. seek() does not consider ConcatSstableIterator::key_range when searching for a table.
  2. seek_idx() does not consider the given key when determining start_index.
  3. The test does not catch it, because it always uses KeyRange::left when calling seek().

To Do

  • Update seek() and seek_idx() (and other functions if needed) to ensure that the iterator points at the correct KV-pair after calling seek().
  • Extend the test (or add more tests) to properly test if seek() works as intended. Possible test cases are:
    • keys outside the key range,
    • keys at various places within the given SSTs (not just in the first SST),
    • keys between SSTs,
    • keys between blocks within an SST.

ALeitert avatar Sep 23 '22 04:09 ALeitert