TileDB
TileDB copied to clipboard
Deletes: implement consolidation.
This implements consolidation of deletes commits. For consolidated fragments, delete conditions existing in the array at the time of consolidation will be processed and added to a list of processed conditions for the fragment as a condition marker. For those fragments, two new attributes are added, the time of deletion for a cell (delete_ts), and the hash of the condition marker that deleted the cell (condition_marker_hash). The fragment consolidator was modified to create and set buffers for those two new attributes as well as for loading the delete tile locations. This is because it opens the array for schema only, then asks to load the fragments. The fragment info call is already loading the delete tile locations, so that information is extracted from there and set on the opened array for reads.
The sparse global order reader, responsible for filling the new buffers was modified to write those new buffers. To do this, a new vector was created for GlobalOrderResultTile to store the condition pointer of the condition that first deletes a cell. As consolidation doesn't use a post query condition bitmap (no query condition, no processing of deleted ts), the bitmap can be used to process a delete condition, and then, by going through the bitmap, we can go set the condition pointer (to the current condition) for cells that were cleared by the current condition, but not by a previous condition. This could have also been done without an extra bitmap, by processing delete conditions in reverse, but it would not work when we implement updates.
This also adds reading fragments consolidated with delete_ts with all readers as well as the ability to bypass conditions already processed for a fragment. For all readers, if a fragment has delete_ts, those tiles always need to be loaded... So loading the timestamps is done by the all readers and skipped in the reader base code (read_tiles and unfilter_tiles if the fragment doesn't have the delete metadata.
For saving the delete condition which deleted a cell. Instead of saving the full condition marker, which would take a lot of space and introduce performance issues during consolidation, a hash of the condition marker is saved.
Also, this fixes an issue with commits consolidation where the condition marker need to be set from the array directory in the delete tile locations. Finally, the delete tiles were not getting properly loaded from the commits file but from their old locations, and this is now fixed. This was not caught in tests because the test validating this didn't vacuum the delete tiles properly.
TYPE: IMPROVEMENT DESC: Deletes: implement consolidation.
This pull request has been linked to Shortcut Story #19590: Deletes: consolidation support..