es-operator icon indicating copy to clipboard operation
es-operator copied to clipboard

Improved scaling by disabling ES auto-rebalancing

Open otrosien opened this issue 5 years ago • 1 comments

Our current node-group based index allocation is mainly due to the fact that the traffic pattern for certain indices is similar. This served fairly well in the past, but it has certain limitations.

  • ES own rebalancing logic doesn't always choose the best node to locate from / to, because it only considers number of shards, not actual load on the system
  • Indices cannot be scaled up in isolation

As a result we can end up with sub-optimal resource utilisation in our cluster: While some nodes may be under-utilised, other nodes could offload some shards there to balance their load, before having to scale up.

The proposed solution may look like this: Based on the assumption that all nodes should be utilised equally we try to manually balance the shard-to-node allocation in es-operator. Taking a cost-function we can try to optimise the shard-to-node allocation.

otrosien avatar Mar 29 '19 09:03 otrosien

OTOH this is raising the criticality of the ES operator, and these optimizations can easily be destroyed by human interaction, eg enabling auto-rebalancing temporarily.

otrosien avatar Jan 26 '21 11:01 otrosien