external-secrets icon indicating copy to clipboard operation
external-secrets copied to clipboard

Unable to change secret value when `refreshInterval: 0` and `creationPolicy: Orphan`

Open onedr0p opened this issue 1 year ago • 23 comments

Describe the bug

I would like to update a secrets (created from an externalsecret) data values. From what I gathered this should be possible by setting refreshInterval: 0 and creationPolicy: Orphan

To Reproduce

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: test
spec:
  refreshInterval: "0"
  secretStoreRef:
    kind: ClusterSecretStore
    name: onepassword-connect
  target:
    name: test
    creationPolicy: Orphan
    template:
      engineVersion: v2
      type: kubernetes.io/tls
      metadata:
        annotations:
          testAnnotation: annotation 
        labels:
          testLabel: label
  dataFrom:
    - extract:
        key: test
        decodingStrategy: Auto

Now try to edit a field in the generated secret, it will get overridden on save.

Expected behavior

I expect to edit the data values in the generated secret and have them persistent.

Additional context

I am trying to automate pushing my LE certificate to my eso provider (onepassword) and then when the cluster is re-provisioned have the certificate be "imported" from onepassword but this doesn't work because when cert-manager renews the cert it writes to the *-tls secret but ESO just reverts it.

Here's my implementation using Flux:

https://github.com/onedr0p/home-ops/tree/e3b1bda79ef408d497d649f166a92fc04ee0bb4c/kubernetes/main/apps/cert-manager/certificates

  • export folder creates the certificate and pushsecret resources
  • import folder creates the externalsecret that tries to pull down the cert into a secret that cert-manager will read.

Related

https://github.com/external-secrets/external-secrets/issues/2245#issuecomment-1519000742

https://github.com/external-secrets/external-secrets/issues/4029

https://github.com/external-secrets/external-secrets/discussions/3148

onedr0p avatar Nov 12 '24 11:11 onedr0p

I just want to highlight that even after https://github.com/external-secrets/external-secrets/pull/4086, this is still the behavior.

The reason is that the behavior I think most users are expecting when they set refreshInterval=0 is:

  1. Sync/Create the target secret exactly once
  2. If the target secret is deleted, re-create it.
  3. If the target secret's data is changed, revert it.

Your problem is with number 3, but I am still not sure why you are using ESO to manage certificates when you are also managing the Secret with cert-manager?

thesuperzapper avatar Nov 12 '24 18:11 thesuperzapper

Your problem is with number 3, but I am still not sure why you are using ESO to manage certificates when you are also managing the Secret with cert-manager?

@thesuperzapper [1] I want ESO to create the secret for my tls cert when my cluster bootstraps to avoid rate-limiting from letsencrypt, then I want cert-manager to renew that certificate when the renewal date comes up. ESO's pushsecret will then keep my cert updated in my provider so anytime I bootstrap (we go back to [1]) my cluster I don't run into rate-limits with letsencrypt. I get the operators will battle for ownership of the resulting tls secret but ESO should have a way create once and not revert any changes made by cert-manager. Does that help understand the issue more?

onedr0p avatar Nov 12 '24 18:11 onedr0p

@onedr0p Under the current design of ExternalSecrets with refreshInterval=0, that's not possible, although it's a bit of a strange use case for ESO, would a Job that runs once on cluster creation not make more sense?

thesuperzapper avatar Nov 12 '24 20:11 thesuperzapper

@Skarlso, I am interested to know if you think that we should not revert changes made (to the data) when refreshInterval=0?

Currently (after https://github.com/external-secrets/external-secrets/pull/4086) we are detecting this situation in isSecretValid() by saying that if the data actually found in the secret does not match the reconcile.external-secrets.io/data-hash that we should reconcile.

In that PR I kept the current behavior, but there are pros to not re-syncing when data is changed:

  • The core benefit is that if the secret already exists, and we have reconciled the ExternalSecret at least once, then we truly will never update the target secret again
  • This means we can never accidentally re-generate secrets if the secret source is a generator, which is probably what users are expecting when they use generators with refreshInterval=0.

The only negative is that that when users are not using generators (i.e. they are pulling from an actual secret store), they are probably simply saying that the source secret does not change if they set refreshInterval=0, but that they still want ExternalSecrets to manage that data in the secret (and revert unexpected changes).

thesuperzapper avatar Nov 12 '24 20:11 thesuperzapper

would a Job that runs once on cluster creation not make more sense?

It's a bit janky to have the rollout of my cluster kind of be dependent on this job but I get your point.

I know you're aware, currently there's no way to have ESO create a secret that can be mutated by another operator or cluster admin which I think would be a useful feature any way it would be implemented.

Here's a couple ideas for thought:

  • Maybe there can be a new field called syncPolicy that accepts Always or Once as options with Always being the default.
  • Or, instead let refreshInterval: -1 to disable eso from updating the secret. This would ensure backwards compatibility for people who already use refreshInterval: 0 in it's current functionality.

onedr0p avatar Nov 12 '24 22:11 onedr0p

Maybe there can be a new field called syncPolicy that accepts Always or Once as options, with Always being the default.

I think this would be really helpful. Alternatively, similar to how Push Secret works, we could have an IfNotExists option for the creationPolicy. This way, it would only create the secret if it doesn't already exist, without overwriting any subsequent changes. This option would also need to disable or ignore all other update behaviors, except for refreshInterval, which can be used to continuously check if the secret exists and proceed accordingly.

This would be super useful for bootstrap use cases, as mentioned. I would be happy with any implementation. Love the project.

heavybullets8 avatar Nov 14 '24 02:11 heavybullets8

Problem with all of these is backwards compatibility. We need to be able to re-create a given secret if it was e.g. wrongfully deleted within your cluster.

Removing Updates (e.g. with SyncPolicy or with refreshInterval: -1) essentially means we want to run a Job. external-secrets was never designed for it, and I can see a lot of edge cases when we mix these with some manifests with e.g. other creation policies.

gusfcarvalho avatar Nov 14 '24 15:11 gusfcarvalho

IMO - I think we should probably try to fix job-related issues within job-related tools Like CI or workflows instead of trying to create a reconciler that does not Update :thinking: .

gusfcarvalho avatar Nov 14 '24 15:11 gusfcarvalho

I think this would be really helpful. Alternatively, similar to how Push Secret works, we could have an IfNotExists option for the creationPolicy. This way, it would only create the secret if it doesn't already exist, without overwriting any subsequent changes. This option would also need to disable or ignore all other update behaviors, except for refreshInterval, which can be used to continuously check if the secret exists and proceed accordingly.

to my point - what would be the expected outcome of:

  • 1 ExternalSecret with IfNotExists policy
  • 1 ExternalSecret targeting the same Secret with Merge Policy?
  • 1 ExternalSecret targeting the same Secret with Orphan Policy?

:point_up: Should we just ignore that secret update?

What we should not do is having the same set of manifests leading to possible different results just because they were reconciled on a different order, and I think the spirit of the ideas here go to this direction.

IMO We would need to have a proper design, PoC, and edge case evaluation before moving this forward :sweat_smile:

gusfcarvalho avatar Nov 14 '24 15:11 gusfcarvalho

Your problem is with number 3, but I am still not sure why you are using ESO to manage certificates when you are also managing the Secret with cert-manager?

@thesuperzapper [1] I want ESO to create the secret for my tls cert when my cluster bootstraps to avoid rate-limiting from letsencrypt, then I want cert-manager to renew that certificate when the renewal date comes up. ESO's pushsecret will then keep my cert updated in my provider so anytime I bootstrap (we go back to [1]) my cluster I don't run into rate-limits with letsencrypt. I get the operators will battle for ownership of the resulting tls secret but ESO should have a way create once and not revert any changes made by cert-manager. Does that help understand the issue more?

By design - if ESO manages a given key (tls.key in this case) - it will always fight for it - as this is a controller. While you can have multiple manifests composing a given Secret - you need to make them compose different keys by yourself, though. The only way around this is to not have the tls.key as part of your secret, which would probably defeat the purpose you're trying to achieve here.

Properly supporting what you're trying to achieve here only with ESO, we would need a way to specify per-key policies or something like that - which is why I mentioned a proper design on it.

(as if I understood correctly, If an user decided to change tls.crt on your given Secret, you'd actually want ESO to pull it back in, as opposed to having cert-manager re-generate it). You really want ESO to forget about updating the tls.key

gusfcarvalho avatar Nov 14 '24 15:11 gusfcarvalho

Also relevant here : https://github.com/external-secrets/external-secrets/issues/2699

gusfcarvalho avatar Nov 14 '24 15:11 gusfcarvalho

@onedr0p I believe from 0.11.0 onwards this should not update anymore. Can you confirm?

gusfcarvalho avatar Dec 19 '24 11:12 gusfcarvalho

@gusfcarvalho Was something changed? Last update was:

IMO We would need to have a proper design, PoC, and edge case evaluation before moving this forward 😅

onedr0p avatar Dec 19 '24 12:12 onedr0p

Yeah, this was discussed over community meetings and over https://github.com/external-secrets/external-secrets/pull/4086. While the implementation was not configurable for users, we decided to change refreshInterval: 0 behavior altogether and never update it after changes

I am not sure your issue is fixed with it, though - that's why I'm asking for a confirmation.

gusfcarvalho avatar Dec 19 '24 12:12 gusfcarvalho

@gusfcarvalho I just tried this and it still reverts the secret to the previous value...

To reproduce:

  1. Apply an ES like this:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: test
  namespace: default
spec:
  refreshInterval: "0"
  secretStoreRef:
    kind: ClusterSecretStore
    name: onepassword-connect
  target:
    name: test-secret
    creationPolicy: Orphan
    template:
      engineVersion: v2
      data:
        TEST_TOKEN: "{{ .TEST_TOKEN }}"
  dataFrom:
    - extract:
        key: test
  1. Edit the generated secret:
❯ k edit secret -n default test-secret
secret/test-secret edited
  1. Watch ESO revert the value of TEST_TOKEN back to the original value.

onedr0p avatar Dec 19 '24 15:12 onedr0p

Thanks for the time on it. I’ll come up with a solution for this

gusfcarvalho avatar Dec 19 '24 15:12 gusfcarvalho

I justed wanted to create an issue regarding changing spec.target.creationPolicy from Owner to Orphan.

I would have expected the secret to have the ownerReference removed, but it stays, so the secret is deleted if the owner external secret is deleted.

@gusfcarvalho @Skarlso Can you clarify if this is considered a bug or expected behaviour?

If it is expected, how can "orphan" an existing "owned" secret (without editing config manually) ?

renepupil avatar Mar 04 '25 19:03 renepupil

It will only remove it if the controller is the owner. Is that the case? It should work...

		// if the CreationPolicy is Owner, we should set ourselves as the owner of the secret
		if externalSecret.Spec.Target.CreationPolicy == esv1beta1.CreatePolicyOwner {
			err = controllerutil.SetControllerReference(externalSecret, secret, r.Scheme)
			if err != nil {
				return fmt.Errorf("%w: %w", ErrSecretSetCtrlRef, err)
			}
		}

Skarlso avatar Mar 04 '25 20:03 Skarlso

I think to cover all of these issues, we should change creationPolicy: Orphan default behavior from:

The operator creates the secret but does not set the ownerReference on the Secret. 
That means the Secret will not be subject to garbage collection. 
If a secret with the same name already exists it will be updated.
Whenever triggered via RefreshPolicy conditions, the operator creates/updates 
the existing secret according to the provider available information. 
However, the operator will not watch on Secret Changes (delete/updates), nor trigger 
Garbage Collection when the ExternalSecret object is deleted.

!!! warning
    If you set RefreshPolicy: Periodic and CreationPolicy: Orphan,
    any changes manually done to the Secret will eventually be replaced on the next sync interval.
    Use CreationPolicy: Orphan with caution.

This way with the new RefreshPolicies we can support all possible combos:

  • Never update on Provider change (RefreshPolicy: CreateOnce), always enforce Secret (CreationPolicy: Owner/Merge)
  • Always update on Provider change (RefreshPolicy: OnChange/Periodic), never enforce secret (CreationPolicy: Orphan)
  • Always update on both (RefreshPolicy: OnChange/Periodic) (CreationPolicy: Owner/Merge)
  • Never update on both. (CreationPolicy: Orhpan) (RefreshPolicy: CreateOnce)

@Skarlso thoughts? IMO, i think this isn't a breaking change, just better defining Orphans Scope to be focused on the Target Secret.

gusfcarvalho avatar Apr 13 '25 14:04 gusfcarvalho

Yah that matrix makes sense.

Just to be clear the above the from part is about the documentation right?

Skarlso avatar Apr 15 '25 06:04 Skarlso

refreshInterval: 0 meaning no updates even if the spec itself changes is a pretty wild breaking change.

refreshInterval ... meaning "refresh" ought to mean polling or refreshing, on an interval, the remote referent - ie, checking upstream sources (SSM, Secrets Manager ...) for updated source values and not in fact update the behavioral relationship between ExternalSecret => Secret object. If the owner spec changes it ought to sync, PERIOD.

Now, from 0.11 it completely violates any sense of normalcy regarding updates in an objects spec.

Case in point from the description in #4086:

We now correctly handle the case where spec.refreshInterval=0 so it matches our documentation:

  1. We will only create the secret once, and we will not update it, even if the spec.target.template changes in the ExternalSecret.
  2. If the target secret is deleted, we will recreate it with the latest data/template.
  3. If the target secret data is manually updated, we will revert it back to the latest data/template from the ExternalSecret.

How can these behaviours coexist and be called correct? I'm just trying to understand.

Case 1 contradicts the behaviour of 2 and 3!

Case 1 ought to be the most clear case where, via direct manipulation of the ExternalSecret object itself, and its spec.data specifically, it would be expected for the underlying object to be synced with the owner object's spec.

If you want to orphan objects there are already mechanisms for that even outside of specific operator features such as this.

It's also odd that refreshPolicies influence this same mechanism, but whatever. In my mind, a create-once thing ought to be a Job-like behaviour and require a completely new mechanism. Or for the operator to handle it themselves, which is and has always already been supported.

  1. create ES object
  2. delete and orphan

The oddity here is that refreshInterval not only influence the remote refresh of the referent values from external secret sources - it now also influences the relation between ExternalSecret object and Secret. It should've been kept the way it was.

frimik avatar Apr 26 '25 17:04 frimik

Hi @frimik . This was discussed 6 months ago when we actually did that change. It was approved with community vote and it is something we will not back down.

Please mind your tone, you’re being a self-entitled open source user.

If you want to participate on the projects decisions / features, consistently join community meetings or be a maintainer.

Otherwise, stop complaining for things that happened almost a year ago.

We already introduced refresh policies on newer versions that cover basically any use case now. Please update yourself on the project before posting this kind of comment 🤦🏽‍♂️🤦🏽‍♂️

gusfcarvalho avatar Apr 26 '25 17:04 gusfcarvalho

The only issue current present in this thread is the current weird Orphan behavior. That should still be covered.

gusfcarvalho avatar Apr 26 '25 17:04 gusfcarvalho