quickwit
quickwit copied to clipboard
Publish exclusively splits belonging to the current indexing generation
Description
This PR introduces the notion of "indexing generation". An indexing generation covers the period between two rebalances. Each time a rebalance occurs, the source increments the indexing generation. The current batches of messages and splits in flight become invalid.
The various actors involved along the pipeline check whether the indexing generation of the batches or the splits are current before performing any heavy operation (commit, upload) and discard them if necessary.
On publish, the indexing generation is locked. The lock is kept for the duration of the publish operation to avoid a race condition between publishers of generation N and consumers of generation N + 1 resuming after a rebalance.
How was this PR tested?
Pending