fdb-kubernetes-operator maxConcurrentReplacements causing deletion update strategy

maxConcurrentReplacements causing deletion update strategy

Open simenl opened this issue 1 year ago • 5 comments

What happened?

As a mitigation to storage roles being recruited to log processes [forum post], we tried setting maxConcurrentReplacements to reduce the number of concurrent exclusions. However, this caused the Deletion strategy to incorrectly be applied on the remaining processes, for updates that requires the Replacement strategy. Consequently this resulted in unschedulable pods, as we updated the node selector to an availability zone incompatible with the existing persistent volume on the process.

What did you expect to happen?

Processes that requires replacement, should not be eligible for the delete update strategy. Even if they were not selected for replacement (yet) due to maxConcurrentReplacements.

How can we reproduce it (as minimally and precisely as possible)?

Create a foundationdb cluster with maxConcurrentReplacements and multiple storage pods/processes:

spec:
  automationOptions:
    maxConcurrentReplacements: 1

Make an update to the CRD that requires a replacement. E.g. changing the the node selector.
Observe that some of the storage pods will be updated through the delete update strategy

Anything else we need to know?

No response