datacube
datacube copied to clipboard
Combine batches in flight to reduce writes
The write path involves combining a bunch of writes into a single batch and placing the batch on a queue to be flushed to the database. Each batch is flushed by a single thread. This has a couple of performance problems:
- If the batch size is large, it can take a long time for a single thread to flush a batch, even if other flush threads are idle.
- If multiple batches are writing to the same target, then we miss an opportunity to combine those writes into a single write. For use cases where values are written frequently, combining these writes could speed things up by >100%.
The alternative I have in mind is to have a queue of pending writes (single value writes, not batches). But this would be more than a queue; it would also combine all items having the same key. So, as new writes arrive in the queue, they would be combined with existing writes to the same destination.