secrets-store-csi-driver icon indicating copy to clipboard operation
secrets-store-csi-driver copied to clipboard

Secret object not renewed on change in SecretProviderClass

Open Dyllaann opened this issue 4 years ago • 33 comments

Hi! I found the following issue, which I also listed in #321, but I think it's not explained properly. I'll try to describe it using a scenario below.

What steps did you take and what happened: Assume scenario 1 where an application with the following snippets are deployed: SecretProviderClass.yaml (partial)

secretObjects:
  - data:
    - key: mysecret1
      objectName: mysecret1
    secretName: secret-object
    type: Opaque

Deployment.yaml (Partial)

env:
- name: MYSECRET1_ENV_VAR
  valueFrom:
    secretKeyRef:
      name: secret-object
      key: mysecret1

The pod boots up, mounts the volume, the secret is created and the environment variable is linked from the secret. Now, we add a second secret into the SecretProviderClass

secretObjects:
  - data:
    - key: mysecret1
      objectName: mysecret1
    - key: mysecret2
      objectName: mysecret2
    secretName:secret-object
    type: Opaque

and mount it in the pod similarly to the first secret like

env:
- name: MYSECRET1_ENV_VAR
  valueFrom:
    secretKeyRef:
      name: secret-object
      key: mysecret1
- name: MYSECRET2_ENV_VAR
  valueFrom:
    secretKeyRef:
      name: secret-object
      key: mysecret2

and update the deployment.

Now, the pod will be refreshed because it's spec changed. The result of this is:

  1. The pod will boot up and mount the CSI volume
  2. The secrets are both correctly mounted under /mnt/secrets-store/mysecret1 and /mnt/secrets-store/mysecret2
  3. The secret object secret-object is not renewed, thus still only having mysecret1
  4. The environment variable mount for MYSECRET2_ENV_VAR fails, because mysecret2 is not in the object.

What did you expect to happen: I expected the secret to change. After the deployment update, the pod restarted and I expected the secret object secret-object to contain keys mysecret1 and mysecret2

Anything else you would like to add: The problem with this is that I don't need the reconciler feature itself. My secrets that need to be pulled from KeyVault are not updated periodically. I just need to chance which secrets are mounted to each pod. If I were to enable the reconciler, there is still the problem where my pod will be error looping untill the reconciler renewed the secret object. Besides that, I don't see the use case of enabling the reconciler with a refresh interval of 10 seconds. On the other hand, setting it to an hour could mean my pod doesn't start up for 59 minutes because it is unable to reference the secretKeyRef

Which provider are you using: Azure Key Vault

Environment:

  • Secrets Store CSI Driver version: (use the image tag): 0.0.16
  • Kubernetes version: (use kubectl version): 1.16.15

Dyllaann avatar Nov 18 '20 10:11 Dyllaann

@Dyllaann Thank you for the detailed explanation. The behavior you're observing is currently expected.

The Secrets Store CSI Driver has 2 reconcilers -

  1. That creates the initial Kubernetes secret when the pod is created. This reconciler never updates the secret if it already exists. This is because if the SecretProviderClass is updated and then a new pod is created referencing the SecretProviderClass, then it'll result in inconsistency in the values for pod1 and pod2.
  2. The rotation reconciler which updates the mount and the Kubernetes secrets with the latest values. This will update the secret and mount with the latest values after the rotation interval.
  • If the pod is restarted after the rotation takes place, the pods will be using the latest Kubernetes secret that contains the new keys added to SecretProviderClass.
  • If the pod is restarted before rotation, then the first reconciler will check if secret exists and not update with the new keys.

Without the rotation feature, if you want to update the pod and Kubernetes secret, you can follow the instructions here - https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/master/docs/README.limitations.md#mounted-content-and-kubernetes-secret-not-updated-after-secret-is-updated-in-external-secrets-store. tl;dr If syncing as Kubernetes secret, delete the secret and then restart the pod. New secret will be created based on latest SecretProviderClass.

This distinction in reconcilers is to avoid inconsistency and conflicts for 2 reconcilers updating the objects over and over again.

aramase avatar Nov 18 '20 17:11 aramase

Thanks for clarifying that. I can understand there will be use cases where the inconsistency will cause problems. Unfortunately in my case, I use a SecretProviderClass for each individual application (not sure if this is the intended usage, but hey it works partially). If the SecretProviderclass backing a deployment/daemonset is changed, that means there probably are also changes to the pod itself and all of them will be replaced anyway.

If there is a deployment with two replica's, if pod1 is being renewed using the new SecretProviderClass, I do agree pod2 will expierience a short inconsistency from being created against an older version of the secret object. However, after pod1 is successfully renewed, I expect pod2 to follow.

In any case, I understand keeping support for the current behaviour is a requirement. Would it be possible to add a boolean to the SecretProviderClass that can indicate whether to clear the old secret upon a new volume mount when the current secret object no longer matches the secretObjects: from the SecretProviderClass?

Having to manually delete each secret is a blocker for us, as there could be the scenario where a change is made in a Helm chart that doesn't involve secrets and/or pod changes. If CI/CD deletes the secret before running helm upgrade, there will be no new secret object created because the pods don't restart. That'll only create problems further down the line.

Dyllaann avatar Nov 18 '20 18:11 Dyllaann

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Feb 16 '21 18:02 fejta-bot

/remove-lifecycle stale

Dyllaann avatar Feb 16 '21 18:02 Dyllaann

Having to manually delete each secret is a blocker for us, as there could be the scenario where a change is made in a Helm chart that doesn't involve secrets and/or pod changes. If CI/CD deletes the secret before running helm upgrade, there will be no new secret object created because the pods don't restart. That'll only create problems further down the line.

Ran in to the same issue, the secret had to be deleted manually when new object version is added.

dhananjaya94 avatar Mar 04 '21 08:03 dhananjaya94

@aramase , could you explain how the rotation reconciler works when a new object version is set in the Azure Key Vault ? The secret storage class is applied to the cluster with the new object version. The new version of the secret is mounted to the pod but was not updated in the Kubernetes secret or in the pod environment variables. The reconciler value is the default value https://github.com/Azure/secrets-store-csi-driver-provider-azure/blob/04c1fae211b522a84a7818cac7b7daaef1ca9ef2/charts/csi-secrets-store-provider-azure/values.yaml#L110. Had to delete the secret and recreate the pod. Only after that the correct value is set to the env

dhananjaya94 avatar Mar 05 '21 03:03 dhananjaya94

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jun 03 '21 04:06 fejta-bot

/remove-lifecycle stale

endeepak avatar Jun 09 '21 13:06 endeepak

Not having each pod contain a consistent view of the external secret in a way that can be updated declaratively is a dealbreaker for this driver. We want to apply an updated SecretProviderClass and Deployment and when new pods are created the class should be ensuring those new pods are loaded with 1) any changes made to the external secrets 2) any changes made to the provider class.

At least the driver should support a force-refresh of the secrets when the SecretProviderClass is modified regardless of the normal refresh interval.

mmerickel avatar Jul 18 '21 20:07 mmerickel

Jumping in a little late. Has this been resolved? Hitting the same issue that the ENV secret is not rotated.

brucedvgw avatar Jul 27 '21 03:07 brucedvgw

Would giving the k8s secret a dynamic name (e.g. append chart version) be a work-around?. This way each deployment of a new version would result in a new secret, which therefore must be a new call to the vault.

rlaveycal avatar Jul 27 '21 13:07 rlaveycal

@rlaveycal this was my stop-gap solution as well - haven't tried it yet to confirm it works but I suspect it will. In helm you can use .Release.Revision.

mmerickel avatar Jul 27 '21 14:07 mmerickel

@rlaveycal this was my stop-gap solution as well - haven't tried it yet to confirm it works but I suspect it will. In helm you can use .Release.Revision.

This works well for me with the downside being that Helm does not clean up the secrets created by previous releases resulting in a bunch of orphaned secrets lying around in k8s.

tyanko1 avatar Aug 31 '21 15:08 tyanko1

the orphan k8s secrets are the responsibility of the driver not Helm. They have the label secrets-store.csi.k8s.io/managed=true on them but I can't see any logic that deletes them

rlaveycal avatar Aug 31 '21 16:08 rlaveycal

I have since implemented that stop-gap solution and for us it has been properly cleaning up the secrets. I will say we are making a new SecretProviderClass and new secretName from it as the secretObjects[0].secretName both using {{ include "chart.fullname" . }}-{{ .Release.Revision}} - so far no orphaned secrets this way.

mmerickel avatar Aug 31 '21 18:08 mmerickel

@aramase can be rotation reconciler configured with rotationPollInterval 0 to perform the secret update when volume mounted?

amarkevich avatar Oct 29 '21 10:10 amarkevich

@aramase any update on this? We are seeing this issue impacting many AKS users...

erjosito avatar Jan 12 '22 11:01 erjosito

@aramase any update on this? We are seeing this issue impacting many AKS users...

@erjosito Could you provide more details on what the exact flow of events and the impact? Is secret rotation feature enabled? If yes, what is the rotation poll interval?

aramase avatar Jan 12 '22 17:01 aramase

@aramase my bad, we hadnt enabled secret rotation (shame on me!). We will have a look at it. Thanks for your answer!

erjosito avatar Jan 12 '22 18:01 erjosito

I've tried mmerickel's solution ({{ .Values.myCustomName }}-{{ .Release.Revision }}) and it works for me (partially). Now the new secret is created and used properly in the deployment ;).

However, I do get the orphaned secrets.

$ kubectl get secret -l secrets-store.csi.k8s.io/managed=true

NAME                    TYPE     DATA   AGE
svc3-params             Opaque   21     3d
svc2-params             Opaque   14     7d15h
svc1-params             Opaque   27     7d15h
svc1-params-18          Opaque   28     26h
svc1-params-19          Opaque   28     26h
svc1-params-21          Opaque   28     26h
svc1-params-23          Opaque   28     25h
svc1-params-24          Opaque   28     3h52m

Any ideas on why is that happening or how to troubleshoot this issue?

ryuzakyl avatar Mar 29 '22 11:03 ryuzakyl

I have the same issues as @ryuzakyl with orphaned secrets.

@mmerickel or anyone else have a solution for this? It means every time a we upgrade our release a new secret is made therefore a new orphaned secret is made...

markandersontrocme avatar May 27 '22 21:05 markandersontrocme

I did have some orphan secrets. It does clean up some of the time.

Unfortunately this repo has really been not seeing much support from aws so I switched to external-secrets and stakater’s reloader and have been happy with it. What pushed me over the edge was that external-secrets works well with irsa, has great template support, and the reload support in the csi driver is global (and alpha) instead of per secret.

The csi driver has some nice features like making the pod wait while secrets get mounted and the ability to not use k8s secrets at all if you want but it just isn’t getting enough love lately.

ArgoCD also has poor support for .Release.Revision in helm which triggered me to look for a solution that was release agnostic as well.

mmerickel avatar May 27 '22 22:05 mmerickel

@aramase Are there any plans to better support this? I tried using rotation, while it works well, it also easily incures extra charges as I would want to set the pollingInterval low (1m or so)

Using the -{{ .Release.Revision }} in the SecretProviderClass and Secret name works well, however it leaves orphaned secrets. So having a mechanism be able to clean those up would be ideal.

Maybe we can use SecretProviderClassPodStatus to find unused secrets?

markandersontrocme avatar May 30 '22 19:05 markandersontrocme

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 28 '22 20:08 k8s-triage-robot

/remove-lifecycle stale

markandersontrocme avatar Sep 12 '22 13:09 markandersontrocme

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 11 '22 14:12 k8s-triage-robot

/remove-lifecycle stale

vl-shopback avatar Dec 14 '22 06:12 vl-shopback

It's a shame that updating the SecretProviderClass doesn't update the secrets unless you have a rotation setup.

Preen avatar Feb 22 '23 10:02 Preen

/remove-lifecycle stale

giordanocardillo avatar Apr 28 '23 07:04 giordanocardillo

Since 2020 this design choice was not reviewed. @Dyllaann made a very reasonable suggestion to add a flag to control the behavior (pod1 & pod2 inconsistency vs External Secret & K8s Secret inconsistency), but the maintainer simply ignored it.

What a shame, this would be a great driver if that "feature" was reviewed.

marcio-ch avatar May 23 '23 21:05 marcio-ch