fdb-record-layer icon indicating copy to clipboard operation
fdb-record-layer copied to clipboard

Lucene partition balancing: optimize and fix transaction timeouts

Open jjezra opened this issue 6 months ago • 0 comments

Currently, the rebalancing code uses a single transaction to iterate through the grouping keys and rebalance the first partition in need in each group. This may cause a transaction timeout, and limits the number of documents that can moved - which causes extra merges. The suggested solution is to:

  1. Use a read-only Agility Context to read the grouping keys and partitions info
  2. Use the same AC to evaluate if a rebalancing is needed on a give partition
  3. If needed, create a new, dedicated transaction to rebalance each single partition

jjezra avatar Aug 06 '24 20:08 jjezra