fdb-kubernetes-operator
fdb-kubernetes-operator copied to clipboard
Making scheduler more aware of pods that are pending replacement
What would you like to be added/changed?
The operator relies on pod anti-affinity to distribute processes across fault domains. When we're doing replacements, we can see sub-optimal placement of the replacement pod. Let's say we have exactly three zones available, and have three storage servers, as follows:
- storage-1 on zoneA
- storage-2 on zoneB
- storage-3 on zoneC
And let's saw we have equal available capacity in all three zones. If we replace storage-2 with a new pod called storage-4, the anti-affinity rules try to avoid zoneA, zoneB, and zoneC equally. This could lead to storage-4 being placed on zoneC, giving us two pods one zoneC, one pod on zoneA, and zero pods on zoneB. This is sub-optimal distribution and could lead to issues with coordinator selection or data distribution.
To improve this, we could use labels to indicate which pods are pending replacement, and configure the anti-affinity rules to only consider pods that are not pending replacement. In the scenario above, this would tell the scheduler to avoid zoneA and zoneC, but show zoneB as being desireable for new pods, making it more likely that we would get an even distribution.
I believe that was one of the goals of the colouring approach: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/bin_pack_fault_domains.md