operator icon indicating copy to clipboard operation
operator copied to clipboard

[Bug] The tenant's StatefulSet replicates the Tenant's metadata

Open aqeelat opened this issue 1 year ago • 1 comments

When the operator syncs the tenant, it copies the labels and annotations from the Tenant to the StatefulSet, causing undesirable side effects.

Expected Behavior

Similar to how the Service metadata are synced, the labels and annotations that should be cascaded to the StatefulSet should be explicitly specified in the CRD.

Current Behavior

https://github.com/minio/operator/blob/03613d29e55730ff75f2dbfa70a15e2c4c7ea9db/pkg/resources/statefulsets/minio-statefulset.go#L630-L632

Possible Solution

  • Adding a section to the CRD called poolsMetadata for the labels and annotations that are shared across all pools.
  • Adding a boolean to the CRD called cascadeTenantMetadata that controls the lines in the snippet above.

Steps to Reproduce (for bugs)

  1. Create Tenant object with labels
  2. Watch the labels get cascaded to the StatefulSet

Context

I opened #2287 before investigating the issue but now I think I have more context to discuss the issue.

Because the operator cascades the argocd label to the stateful set, cilium thinks that this label is a security-relevant label, and uses it to create endpoints for the pods and the identity object that governs these endpoints. The issue appears because the identity object has the argocd label, which makes argo thinks that it is a top level object, but when it compares it with the actual application manifest, it does not find it there. Therefore, argo will think that this object was removed from the manifest and will try to remove it from the cluster as well.

We're using many other operators such as Percona, ECK, PGO, but we only see this behavior with the Minio Operator. Once this issue is fixed, the operator can be declared GitOps friendly.

Regression

The issue first appeared in #295

Your Environment

  • Version used (minio-operator): 6.0.2
  • Environment name and version (e.g. kubernetes v1.17.2): GKE 1.29
  • Server type and version:
  • Operating System and version (uname -a):
  • Link to your deployment file:

aqeelat avatar Sep 15 '24 09:09 aqeelat

AFAIK, there is no official statement whether labels and annotations should be propagated to child objects. Some people prefer this behavior (link), but a Kubernetes dev mentions that it's not desirable that labels propagate (link).

Not propagating labels and annotations would be an easy change and we should add a statefulsetMetadata field that will hold the annotations/labels for the generated statefulset. However this may break existing installations, so we can't make this change that easy. It would require a migration path:

  1. Add the preventMetadataPropagation field (defaults to false) to the Tenant CRD that will disable propagation of annotation/labels.
  2. Add the statefulsetMetadata field in the Tenant CRD that will add annotations/labels to the generated statefulset.

In the next major release of the operator the preventMetadataPropagation field can be removed, because we can only have these kind of breaking changes in major releases.

ramondeklein avatar Sep 17 '24 09:09 ramondeklein

I agree with you on the solution. Let me know if there's anything I can do to help.

aqeelat avatar Oct 03 '24 20:10 aqeelat

we do bunch of fetching and queries using labels as selectors for queries removing them will indeed break stuff.

cesnietor avatar Oct 03 '24 20:10 cesnietor

@ramondeklein @dvaldivia please discuss this together to figure out a proper solutions since this might break existing integrations.

cesnietor avatar Oct 14 '24 16:10 cesnietor

Alternative fix, see https://github.com/minio/operator/pull/2337

ramondeklein avatar Oct 15 '24 10:10 ramondeklein

@ramondeklein This is half a solution. In the new major version, this is how it will look like (with adding {} as the default value). But a corrective action should be taken to prepare for it (i.e. the preventMetadataPropagation)

In other words, the propagation should be opt-out in v6 and opt-in in v7.

Still, I think your PR should be merged.

aqeelat avatar Oct 15 '24 10:10 aqeelat

@ramondeklein Once your PR is merged, we could bring up the discussion on the path to make {} as the default value.

aqeelat avatar Oct 15 '24 11:10 aqeelat

@aqeelat If we could start from scratch, then {} would be the default value. But we don't want to break existing installation from customers that upgrade the operator. This opt-out should work fine and I prefer doing this to a new property that opts-out.

ramondeklein avatar Oct 15 '24 11:10 ramondeklein