secrets-store-csi-driver
secrets-store-csi-driver copied to clipboard
Secret object not renewed on change in SecretProviderClass
Hi! I found the following issue, which I also listed in #321, but I think it's not explained properly. I'll try to describe it using a scenario below.
What steps did you take and what happened: Assume scenario 1 where an application with the following snippets are deployed: SecretProviderClass.yaml (partial)
secretObjects:
- data:
- key: mysecret1
objectName: mysecret1
secretName: secret-object
type: Opaque
Deployment.yaml (Partial)
env:
- name: MYSECRET1_ENV_VAR
valueFrom:
secretKeyRef:
name: secret-object
key: mysecret1
The pod boots up, mounts the volume, the secret is created and the environment variable is linked from the secret. Now, we add a second secret into the SecretProviderClass
secretObjects:
- data:
- key: mysecret1
objectName: mysecret1
- key: mysecret2
objectName: mysecret2
secretName:secret-object
type: Opaque
and mount it in the pod similarly to the first secret like
env:
- name: MYSECRET1_ENV_VAR
valueFrom:
secretKeyRef:
name: secret-object
key: mysecret1
- name: MYSECRET2_ENV_VAR
valueFrom:
secretKeyRef:
name: secret-object
key: mysecret2
and update the deployment.
Now, the pod will be refreshed because it's spec changed. The result of this is:
- The pod will boot up and mount the CSI volume
- The secrets are both correctly mounted under
/mnt/secrets-store/mysecret1
and/mnt/secrets-store/mysecret2
- The secret object
secret-object
is not renewed, thus still only havingmysecret1
- The environment variable mount for
MYSECRET2_ENV_VAR
fails, becausemysecret2
is not in the object.
What did you expect to happen:
I expected the secret to change. After the deployment update, the pod restarted and I expected the secret object secret-object
to contain keys mysecret1
and mysecret2
Anything else you would like to add:
The problem with this is that I don't need the reconciler feature itself. My secrets that need to be pulled from KeyVault are not updated periodically. I just need to chance which secrets are mounted to each pod.
If I were to enable the reconciler, there is still the problem where my pod will be error looping untill the reconciler renewed the secret object. Besides that, I don't see the use case of enabling the reconciler with a refresh interval of 10 seconds. On the other hand, setting it to an hour could mean my pod doesn't start up for 59 minutes because it is unable to reference the secretKeyRef
Which provider are you using: Azure Key Vault
Environment:
- Secrets Store CSI Driver version: (use the image tag): 0.0.16
- Kubernetes version: (use
kubectl version
): 1.16.15
@Dyllaann Thank you for the detailed explanation. The behavior you're observing is currently expected.
The Secrets Store CSI Driver has 2 reconcilers -
- That creates the initial Kubernetes secret when the pod is created. This reconciler never updates the secret if it already exists. This is because if the
SecretProviderClass
is updated and then a new pod is created referencing theSecretProviderClass
, then it'll result in inconsistency in the values for pod1 and pod2. - The rotation reconciler which updates the mount and the Kubernetes secrets with the latest values. This will update the secret and mount with the latest values after the rotation interval.
- If the pod is restarted after the rotation takes place, the pods will be using the latest Kubernetes secret that contains the new keys added to
SecretProviderClass
. - If the pod is restarted before rotation, then the first reconciler will check if secret exists and not update with the new keys.
Without the rotation feature, if you want to update the pod and Kubernetes secret, you can follow the instructions here - https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/master/docs/README.limitations.md#mounted-content-and-kubernetes-secret-not-updated-after-secret-is-updated-in-external-secrets-store.
tl;dr If syncing as Kubernetes secret, delete the secret and then restart the pod. New secret will be created based on latest SecretProviderClass
.
This distinction in reconcilers is to avoid inconsistency and conflicts for 2 reconcilers updating the objects over and over again.
Thanks for clarifying that. I can understand there will be use cases where the inconsistency will cause problems. Unfortunately in my case, I use a SecretProviderClass for each individual application (not sure if this is the intended usage, but hey it works partially). If the SecretProviderclass backing a deployment/daemonset is changed, that means there probably are also changes to the pod itself and all of them will be replaced anyway.
If there is a deployment with two replica's, if pod1 is being renewed using the new SecretProviderClass, I do agree pod2 will expierience a short inconsistency from being created against an older version of the secret object. However, after pod1 is successfully renewed, I expect pod2 to follow.
In any case, I understand keeping support for the current behaviour is a requirement. Would it be possible to add a boolean to the SecretProviderClass that can indicate whether to clear the old secret upon a new volume mount when the current secret object no longer matches the secretObjects:
from the SecretProviderClass?
Having to manually delete each secret is a blocker for us, as there could be the scenario where a change is made in a Helm chart that doesn't involve secrets and/or pod changes. If CI/CD deletes the secret before running helm upgrade, there will be no new secret object created because the pods don't restart. That'll only create problems further down the line.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
Having to manually delete each secret is a blocker for us, as there could be the scenario where a change is made in a Helm chart that doesn't involve secrets and/or pod changes. If CI/CD deletes the secret before running helm upgrade, there will be no new secret object created because the pods don't restart. That'll only create problems further down the line.
Ran in to the same issue, the secret had to be deleted manually when new object version is added.
@aramase , could you explain how the rotation reconciler works when a new object version is set in the Azure Key Vault ? The secret storage class is applied to the cluster with the new object version. The new version of the secret is mounted to the pod but was not updated in the Kubernetes secret or in the pod environment variables. The reconciler value is the default value https://github.com/Azure/secrets-store-csi-driver-provider-azure/blob/04c1fae211b522a84a7818cac7b7daaef1ca9ef2/charts/csi-secrets-store-provider-azure/values.yaml#L110. Had to delete the secret and recreate the pod. Only after that the correct value is set to the env
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
Not having each pod contain a consistent view of the external secret in a way that can be updated declaratively is a dealbreaker for this driver. We want to apply an updated SecretProviderClass
and Deployment
and when new pods are created the class should be ensuring those new pods are loaded with 1) any changes made to the external secrets 2) any changes made to the provider class.
At least the driver should support a force-refresh of the secrets when the SecretProviderClass
is modified regardless of the normal refresh interval.
Jumping in a little late. Has this been resolved? Hitting the same issue that the ENV secret is not rotated.
Would giving the k8s secret a dynamic name (e.g. append chart version) be a work-around?. This way each deployment of a new version would result in a new secret, which therefore must be a new call to the vault.
@rlaveycal this was my stop-gap solution as well - haven't tried it yet to confirm it works but I suspect it will. In helm you can use .Release.Revision
.
@rlaveycal this was my stop-gap solution as well - haven't tried it yet to confirm it works but I suspect it will. In helm you can use
.Release.Revision
.
This works well for me with the downside being that Helm does not clean up the secrets created by previous releases resulting in a bunch of orphaned secrets lying around in k8s.
the orphan k8s secrets are the responsibility of the driver not Helm. They have the label secrets-store.csi.k8s.io/managed=true
on them but I can't see any logic that deletes them
I have since implemented that stop-gap solution and for us it has been properly cleaning up the secrets. I will say we are making a new SecretProviderClass and new secretName from it as the secretObjects[0].secretName
both using {{ include "chart.fullname" . }}-{{ .Release.Revision}}
- so far no orphaned secrets this way.
@aramase can be rotation reconciler configured with rotationPollInterval
0 to perform the secret update when volume mounted?
@aramase any update on this? We are seeing this issue impacting many AKS users...
@aramase any update on this? We are seeing this issue impacting many AKS users...
@erjosito Could you provide more details on what the exact flow of events and the impact? Is secret rotation feature enabled? If yes, what is the rotation poll interval?
@aramase my bad, we hadnt enabled secret rotation (shame on me!). We will have a look at it. Thanks for your answer!
I've tried mmerickel's solution ({{ .Values.myCustomName }}-{{ .Release.Revision }}
) and it works for me (partially). Now the new secret is created and used properly in the deployment ;).
However, I do get the orphaned secrets.
$ kubectl get secret -l secrets-store.csi.k8s.io/managed=true
NAME TYPE DATA AGE
svc3-params Opaque 21 3d
svc2-params Opaque 14 7d15h
svc1-params Opaque 27 7d15h
svc1-params-18 Opaque 28 26h
svc1-params-19 Opaque 28 26h
svc1-params-21 Opaque 28 26h
svc1-params-23 Opaque 28 25h
svc1-params-24 Opaque 28 3h52m
Any ideas on why is that happening or how to troubleshoot this issue?
I have the same issues as @ryuzakyl with orphaned secrets.
@mmerickel or anyone else have a solution for this? It means every time a we upgrade our release a new secret is made therefore a new orphaned secret is made...
I did have some orphan secrets. It does clean up some of the time.
Unfortunately this repo has really been not seeing much support from aws so I switched to external-secrets and stakater’s reloader and have been happy with it. What pushed me over the edge was that external-secrets works well with irsa, has great template support, and the reload support in the csi driver is global (and alpha) instead of per secret.
The csi driver has some nice features like making the pod wait while secrets get mounted and the ability to not use k8s secrets at all if you want but it just isn’t getting enough love lately.
ArgoCD also has poor support for .Release.Revision in helm which triggered me to look for a solution that was release agnostic as well.
@aramase Are there any plans to better support this? I tried using rotation, while it works well, it also easily incures extra charges as I would want to set the pollingInterval low (1m or so)
Using the -{{ .Release.Revision }}
in the SecretProviderClass and Secret name works well, however it leaves orphaned secrets. So having a mechanism be able to clean those up would be ideal.
Maybe we can use SecretProviderClassPodStatus
to find unused secrets?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
It's a shame that updating the SecretProviderClass doesn't update the secrets unless you have a rotation setup.
/remove-lifecycle stale
Since 2020 this design choice was not reviewed. @Dyllaann made a very reasonable suggestion to add a flag to control the behavior (pod1 & pod2 inconsistency vs External Secret & K8s Secret inconsistency), but the maintainer simply ignored it.
What a shame, this would be a great driver if that "feature" was reviewed.