clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

POD annotations are dropped with the reconcile of CHK STS

Open jirislav opened this issue 1 year ago • 3 comments

Keeping the POD annotations is essential to run the workload in EKS, where the fargate profile is the default one.

Dropping essential annotations, such as "eks.amazonaws.com/compute-type" = "ec2" will cause the POD to be unschedulable due to the fact that:

  • you can't mount a volume to a fargate node.
  • you can't possibly satisfy the nodeSelector & nodeAffinity rules with fargate profile in case you ask for dedicated EC2 node for billing purposes.

Example manifest:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: clickhouse-keeper
spec:
  configuration:
    clusters:
      - name: chk
        layout:
          replicasCount: 3
  templates:
    podTemplates:
      - name: clickhouse-keeper
        metadata:
          annotations:
            eks.amazonaws.com/compute-type: "ec2"

Interestingly, first POD of the 3 replicas starts with correct annotation, but then, the second doesn't as the annotations are dropped from the underlying statefulset.

Note that I also see this in the log of the operator, which is possibly the result of this behavior:

E0802 07:02:22.310975       1 reconciler.go:299] err: Operation cannot be fulfilled on clickhousekeeperinstallations.clickhouse-keeper.altinity.com "chk": the object has been modified; please apply your changes to the latest version and try again

jirislav avatar Aug 02 '24 07:08 jirislav

Please see this pull request to the branch 0.24.0 🙏🏿 .

jirislav avatar Aug 02 '24 07:08 jirislav

I encountered a similar issue when adding additional annotations for Datadog agent metrics scraping.

Here are the details:

  1. Defining annotations in the podTemplates for a CHK manifest works successfully when creating the CHK for the first time.
  2. However, modifying the annotations block later on causes the reconciliation process to drop all annotations.

Kavinjsir avatar Aug 04 '24 23:08 Kavinjsir

We are also randomly seeing reconciler errors on some deploys. since we are using annotations in our env, i would suspect it's the same issue as above mentioned for datadaog

 1 reconciler.go:299] err: Operation cannot be fulfilled on clickhousekeeperinstallations.clickhouse-keeper.altinity.com "keeper": the object has been modified; please apply your changes to the latest version and try again

g-marius avatar Aug 14 '24 15:08 g-marius

@g-marius do you use something like Flux or ArgoCD?

Slach avatar Sep 07 '24 06:09 Slach

I believe the fix is released already as part of 0.24.0

jirislav avatar Sep 20 '24 19:09 jirislav