secrets-store-csi-driver
secrets-store-csi-driver copied to clipboard
Don't rely on pod creation to mount to Kubernetes secrets
Describe the solution you'd like
Currently, when you choose to sync to Kubernetes secrets, pulling secrets/certs/keys from Key vault always happen on pod creation and you also need to mount SecretProviderClass to specific pod to make it work.
If we have issues with Key Vault access (managed identity is not working, things are deleted accidently in Key Vault or Key Vault API is unavailable), things will start to fail when pod gets restart/recreated. And it's even worse for cronjobs, it will try to retrieve secrets/certs/keys from Key Vault every time a new pod is created under this cronjob, but as Key Vault is down, you can no longer create ANY new pod in production.
I hope this library can simply sync secrets/certs/keys to Kubernetes secrets at deploy time and don't need to rely on any pod to mount them, thus I don't have to worry about Key Vault availability after deployment.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Secrets Store CSI Driver version: (use the image tag):
- Kubernetes version: (use
kubectl version):
@djj0809 Thank you for opening the issue! The scope of this project is to primarily mount the external secrets-store contents in the pod using the Kubernetes CSI standard. The sync as K8s secret is an optional feature to mirror those contents as Kubernetes secrets in addition to the mount. I think skipping the pod mount is a diversion from the actual scope of this driver implementation.
@aramase one potential application of such use case is configuring ingress resource. Any thoughts on that?
@aramase one potential application of such use case is configuring ingress resource. Any thoughts on that?
@tdihp That makes sense. The consumption with ingress resource was one of the driving factors for optionally supporting mirroring the mounted content as Kubernetes Secrets. However if we skip the mount completely and only created Kubernetes Secret then that would be a diversion from the actual scope of the driver project. The CSI driver uses the CSI implementation as the core to mount secret contents for safe consumption within pods.
I think this is a reasonable suggestion given the fact that the main use case for the secrets-store-csi-driver is the integration with some kind of vault.
There are a lot of resources that rely on native kubernetes secrets. Let's say that I have 100+ resources depending on a native kubernetes secret. Right now, that would mean that I need to create a SecretProviderClass, then add relatively useless volumeMounts and volumes for those 100+ resources (useless in the sense that the resources will not use the mount itself but the reference to the kubernetes secret). If I do not add that huge amount of boiler plate code for some resources, there could be a risk where no previous resources tried to mount it and the secret itself is unavailable.
If that feature were to be implemented, I would instead only need to create the SecretProviderClass, and reference the kubernetes secret in the resources that need it. This would mean a lot less overhead and cleaner yaml.
I think this could be a big improvement for all the vault integrations that depend on this project, unless there is already some way to get around this with the current setup that I missed.
I agree with jemag on this -- not being able to sync any of the secrets from a vault into a secret means effectively having to manage secrets in two ways for most systems, because pretty much every Helm chart and 3rd party component manages its pods via an Operator or other mechanism, and you can't control how they access secrets. They just expect them to be present.
That leaves administrators having to choose between not using a vault implementation, managing secrets two ways, or using a hacky process of priming the system periodically with a pod to get things synced as a secret. We settled on the latter for now, but its definitely tricky to manage -- the CSI configuration and that pod configuration have to be kept in sync, and if someone misses something, its not always obvious.
Given that the CSI mechanism wakes up when a SecretProviderClass is created or updated, and it knows what secrets are being requested to be synced, it seems like its the best place to create a transient pod to trigger the synchronization.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
+1 on this, will provide a detailed use case shortly.
Another use case is image pull secrets; there is a chicken and egg problem if creating the secret to pull the pod's image requires the pod to be running.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Another use case is where we are trying to create a PV and PVC using azure flexvolume blobfuse driver which needs a secret (type: azure/blobfuse) to be available in cluster to access the storage account. This secret need to be created by picking up the storage account access key from key vault but there is no need to mount this secret in pod.
Note: There is another issue that secrets-store-csi-driver is not supporting the "azure/blobfuse" type of secret creation.
Another quick feedback: I also find very cumbersome the necessity to start a pod in order for a secret to be created. I would say even counter intuitive when looking at this project for the first time, although I understand the initial scope of the driver was different.
As a user, I was hoping for a solution that easily bridges GCP Secret Manager and Kubernetes secrets, without affecting my existing workloads definitions.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I opened a related issue to this on the Azure provider driver. We're not seeing the same behavior where cronjobs pods check for new values before each run. They're still using the cached values because the secrets still exist (the completed pods are not cleaned up yet).
But because the pods are completed/succeeded, the rotation interval does not update the values. In the end, this means jobs start failing because the secret values are wrong.
In the end, I think these lines need to be changed to rotate secrets for completed or failed jobs. It's possible secrets update between runs, the pods still exist, and new mounts will use new values. And it's possible the jobs/pods fail because of bad secret values.
Maybe changing this behavior will also solve the initial ask from this. The secrets won't be made on definition, still on pod creation, but they will be kept up to date for future runs. Then if the vault access fails, the most recent secret should still be okay to go.
https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/0f4ff7d43e591dbc7d3b87ed1d7602a5f15f1059/pkg/rotation/reconciler.go#L267-L270
I'm tagging @aramase because he made the initial commit to exclude those from rotation.
@djj0809: From the above research, it is possible you're pruning out your completed/failed pods? When those are removed, the secrets are removed prompting for refresh on the next run. Check the successfulJobsHistoryLimit and failedJobsHistoryLimit values.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Any update regarding this issue?
We are also looking for a solution to this issue. especially agreeing with @jemag
I think for some use cases described in the comments External Secrets Operator (https://external-secrets.io) could be a reasonable alternative. AFAIK it can just sync external secret with k8s native secret.
Although External Secrets Operator can solve this particular problem, would be nice if we could avoid having two controllers solving similar problems.
another alternative, which i believe most of you have seen https://akv2k8s.io/ ( https://github.com/SparebankenVest/azure-key-vault-to-kubernetes )
There's another use-case in knative and likely other similar tools that do not support custom volume types https://github.com/knative/serving/issues/11069 Syncing the secret only then referencing it 'natively' would enable any use case.
Are there any potential issues with using a gcr.io/google_containers/pause pod to define the mount then have other pods reference the secret?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/lifecycle frozen
Marking this as frozen so it doesn't go to stale. We're exploring the option of decoupling the sync as k8s secret feature from the CSI driver mount.
@aramase may I know if there is any progress regarding this? Thanks🙇
==================
Found some update from the meeting notes. If there is ETA about the feature, it would be great