postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Recreation of DB clusters due changing nodeAffinity term order

Open ljcesca opened this issue 1 year ago • 1 comments

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? AWS EKS
  • Are you running Postgres Operator in production? yes
  • Type of issue? Bug report

We experienced an issue similar to #924 due to changes in ordering of nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions across syncs of clusters.

Our operator configuration was setting two node_readiness_labels via:

node_readiness_label:
  kubernetes.io/arch: amd64
  postgres-cluster: "1"

And an additional label via the Cluster spec:

nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: postgres-plan-small
        operator: In
        values:
        - "1"

We've worked around this for now by removing the node_readiness_label configuration, but would like to be able to use this again the future.

We were able to capture the StatefulSet before and after and confirmed that the order of nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions was changing. Which caused the cluster to be re-synced due to Cluster.compareStatefulSetWith.

I'm happy to work on a PR that fixes this if you agree that changes to Cluster.compareStatefulSetWith is the appropriate approach. Thanks!

ljcesca avatar Aug 11 '22 17:08 ljcesca

Ok, so it seems we need a method that compares the matchExpression of an affinity ignoring the order. I'd welcome a PR on that. Please also add unit tests, too. Thanks :)

FxKu avatar Aug 23 '22 14:08 FxKu