pranadb
pranadb copied to clipboard
Improved MV and index fill
The current MV/index fill occurs as part of DDL execution. It can take considerable time, and if raft leadership changes during this period it will be cancelled. The fill logic is also very complex.
Here is how we can improve it:
- Remove MV/index fill from DDL execution. DDL execution will initiate it but DDL execution will complete before fill is complete
- To fill an MV, we create it and attach it to its feeders so it receives new rows
- We also create a special source called a "fill_source" - this can be created through the normal ddl execution mechanism so is persistent.
- A fill_source, instead of obtaining rows from Kafka messages, obtains them by scanning a set of shards in the database (we can use iterator for this).
- When the fill_source is created it is initiated with a sequence start and sequence end for each shard which is obtained by taking a snapshot of the feeder table and looking at the oldest and newest rows.
- The fill source will load rows in chunks and send them to the MV handleRows() method. As rows are processed it will store last processed offset, so it can start off from there after restart or failure.
- The MV will receive rows from both newly ingested data and data from the fill.
- Each row will need a sequence number of timestamp so the MV can reject older rows when it has seen a newer row for the same key.
- To handle deletes, if the MV receives a delete it will need to store it in a deleted_rows sub table. When receiving a row from fill these can be ignored if there are exists a row in deleted for the same key and later timestamp/sequence.
- The deleted keys can be cached in memory (up to lru size) to avoid too many gets.
- The deleted keys can be deleted when the fill is complete
- When the fill_source has completed up until it's end sequence then it will drop itself.
Update: better method.
We introduce versioning on the data key, then we can create a true snapshot of the data. Then, when creating an MV we insert a new executor type "FillExecutor" in front of the MV. The Fill Executor maintains an iterator on the feeding source. The iterator at any one time has a version associated to it, and only iterates over keys which have the highest version <= to that version. Once it has iterated over all keys for a particular version then it moves to the next version and repeats. This is repeated until there are no more rows. At this point the fill is complete and the FillExecutor switches over to live records. The Fill Executor will store its offset persistently in each batch that is processed so that on failure it will carry on where it left off.