fdb-kubernetes-operator icon indicating copy to clipboard operation
fdb-kubernetes-operator copied to clipboard

Making scheduler more aware of pods that are pending replacement

Open brownleej opened this issue 3 years ago • 1 comments

What would you like to be added/changed?

The operator relies on pod anti-affinity to distribute processes across fault domains. When we're doing replacements, we can see sub-optimal placement of the replacement pod. Let's say we have exactly three zones available, and have three storage servers, as follows:

  • storage-1 on zoneA
  • storage-2 on zoneB
  • storage-3 on zoneC

And let's saw we have equal available capacity in all three zones. If we replace storage-2 with a new pod called storage-4, the anti-affinity rules try to avoid zoneA, zoneB, and zoneC equally. This could lead to storage-4 being placed on zoneC, giving us two pods one zoneC, one pod on zoneA, and zero pods on zoneB. This is sub-optimal distribution and could lead to issues with coordinator selection or data distribution.

To improve this, we could use labels to indicate which pods are pending replacement, and configure the anti-affinity rules to only consider pods that are not pending replacement. In the scenario above, this would tell the scheduler to avoid zoneA and zoneC, but show zoneB as being desireable for new pods, making it more likely that we would get an even distribution.

brownleej avatar Sep 13 '22 20:09 brownleej

I believe that was one of the goals of the colouring approach: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/bin_pack_fault_domains.md

johscheuer avatar Sep 14 '22 14:09 johscheuer