test-infra
test-infra copied to clipboard
Migrate from Kubernetes External Secrets to ~External Secrets Operator~ CSI Driver
What would you like to be added:
- Switch Kubernetes External Secrets defined under https://github.com/kubernetes/test-infra/tree/master/config/prow/cluster to External Secrets Operator, by following https://github.com/external-secrets/kubernetes-external-secrets/issues/864
- Update instruction at https://github.com/kubernetes/test-infra/blob/master/prow/prow_secrets.md
- Announce at https://github.com/kubernetes/test-infra/blob/master/prow/ANNOUNCEMENTS.md
- Announce at [email protected]
Why is this needed:
As announced at https://github.com/external-secrets/kubernetes-external-secrets/issues/864, Kubernetes External Secret is under maintenance mode right now, the new recommendation is to migrate over to External Secrets Operator.
There hasn't been any plan of turning down Kubernetes External Secret, so we might be fine for a while, until it's either incompatible with upcoming kubernetes versions, or newer features/bug fixes are only available from External Secrets Operator.
/sig testing
Any thoughts on SealedSecret as an alternative? Seems more gitops friendly
Any thoughts on SealedSecret as an alternative? Seems more gitops friendly
I can see that https://github.com/bitnami-labs/sealed-secrets is similar to KES(Kubernetes external secret) in terms of generating kubernetes secrets from a more secure custom resources, which is not the only purpose of KES.
The original purpose of KES was introduced to solve the problem:
- Kubernetes secrets were manually applied to the cluster by
kubectl applyfrom dev machine(s) - The secret might get lost if someone accidentally update/delete the value of the secret, or even the cluster was accidentally deleted
As KES syncs secrets from major secret manager providers into kubernetes cluster, so the recovery of kubernetes secret is as simple as re-applying ExternalSecret CR into the kubectl cluster, for example https://github.com/kubernetes/test-infra/blob/d075174d2b9bcbe5aac9391ff306426963d2a37d/config/prow/cluster/kubernetes_external_secrets.yaml#L4
In short, SealedSecret is probably not the best replacement for KES
My think was that external secret is really only marginally different than
a developer doing kubectl apply. Now they are just doing gcloud secret create - same opaqueness as apply it seems? With sealed secret the entire
state lives in git
On Tue, Feb 22, 2022, 7:23 AM Chao Dai @.***> wrote:
Any thoughts on SealedSecret as an alternative? Seems more gitops friendly
I can see that https://github.com/bitnami-labs/sealed-secrets is similar to KES(Kubernetes external secret) in terms of generating kubernetes secrets from a more secure custom resources, which is not the only purpose of KES.
The original purpose of KES was introduced to solve the problem:
- Kubernetes secrets were manually applied to the cluster by kubectl apply from dev machine(s)
- The secret might get lost if someone accidentally update/delete the value of the secret, or even the cluster was accidentally deleted
As KES syncs secrets from major secret manager providers into kubernetes cluster, so the recovery of kubernetes secret is as simple as re-applying ExternalSecret CR into the kubectl cluster, for example https://github.com/kubernetes/test-infra/blob/d075174d2b9bcbe5aac9391ff306426963d2a37d/config/prow/cluster/kubernetes_external_secrets.yaml#L4
In short, SealedSecret is probably not the best replacement for KES
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/test-infra/issues/24869#issuecomment-1047907850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYGXMCW37B2TE5SY53SFDU4OS7TANCNFSM5L4SVMCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Agree with you that both need a manual operation of either kubectl apply and gcloud secret create, the gitops side is pretty similar though, one is SealedSecret CR and the other is ExternalSecret CR, they both can live in git. However SealedSecret can not solve the problem of a user accidentally modifying the secret from the source(previous applied configuration from k8s somewhat helps in this case, but is not capable of recovering from kubectl delete SealedSecret, or even the cluster was accidentally deleted). Using KES reduces such risk level as GCP secret manager version controls secrets, so:
- if someone accidentally changed the value in GCP the secrets values can still be recovered
- if the cluster was accidentally deleted, secrets can still be recovered by applying git source controlled KES CR
I feel like you could say the same about SealedSecret though...
- if someone accidentally changed the value in ~GCP~K8s the secrets values can still be recovered (from git)
- if the cluster was accidentally deleted, secrets can still be recovered by applying git source controlled ~KES~SealedSecret CR
Except for "cluster deleted" I guess you would need to keep the sealed secret keys (to decrypt if cluster is deleted) somewhere, so at some point you need to bootstrap...
Anyhow I have no strong agenda either way, just wanted to throw the idea out there
Thank you @howardjohn , this is really great discussion!
I think I have misunderstood SealSecret to certain extent, now that with your explanation it's a bit more clear now. So SealedSecret:
- Stores secrets as "plain text" in
SealedSecretCR - The private key for decrypting the secrets is only available in k8s cluster
- A secret can only be created by a user running
kubeseal, which would use public keys from k8s cluster
This sounds pretty good, and I can see that other than the cluster being deleted scenario this is also pretty reliable. One thing not super clear from the documentation, is that when a user runs kubeseal <mysecret.json >mysealedsecret.json as of https://github.com/bitnami-labs/sealed-secrets#usage, kubeseal needs to fetch public key from the cluster, do you happen to know whether this was true, @howardjohn ?
@chaodaiG I think you can fetch the pubkey once and store in git. Then a dev experience to add a secret or update would be kubeseal mysecret --key pubkey.crt > mysealedsecret.json. Then postsubmit job kubectl applys to cluster; dev never needs access to the cluster.
But one concern would be that it said the key expires are 30d... so that may not work. I do not have much practical experience with sealed secrets so not 100% sure
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
https://kubernetes.slack.com/archives/C09QZ4DQB/p1654433983124889 is one of the reasons why this should be prioritized. TLDR: syncing build clusters tokens into prow is now a crucial piece in prow working with build cluster, KES flakiness would break this and cause prow stop working with the build cluster
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Build cluster token failed to sync issue happened again https://kubernetes.slack.com/archives/C7J9RP96G/p1667877096344719, this is not good.
/assign
uh-oh. thanks @chaodaiG
I don't want to derail / delay efforts going on in #27932, but has something like https://secrets-store-csi-driver.sigs.k8s.io/ been considered? We could use that with the GCP provider today and there's support for AWS, Azure, and Vault providers if we need to change.
Using the CSI driver + Google Secrets Manager (provider) would allow us to leverage Workload Identity for IAM secret access. I believe we'd also have better insight into access / auditing.
I know GCP costs are a concern, the pricing page indicates that it would be $0.06 per secret/per location and $0.03 per 10,000 access operations (and $0.05 per rotation). I don't think the costs would be astronomical, but it would be worth doing a closer inspection if we decided to pivot towards this solution.
I'm happy to demo / help move forward if we want to go that direction, however I understand the urgency and value the progress already made.
Edit: It looks like External Secrets Operator also allows us to use Secret Manager + WID if we'd like: https://external-secrets.io/v0.6.1/provider/google-secrets-manager/. I think it would come down to the easier to maintain, and more active (future-proof-ish?), solution.
hi @jimangel , it's not a derail at all. iirc csi driver for GCP was in its super early release cycle when we decided to adopt Kubernetes External Secret. The proposal of transition from Kubernetes External Secret to External Secret Operator was pretty much a lazy action based on the recommendation from Kubernetes External Secret.
In terms of the cost, we don't have that many secrets and lots of access operations so I wouldn't be too worried about it.
I would be glad to take another look at that csi driver for GCP since it's ready now, will do a quick evaluation by myself in terms of operational and maintenance perspective, and will get your thoughts if there is any question come up.
Had an extensive and wonderful offline discussion with @jimangel , and here is what we agreed on:
- External Secret Operator works as a central proxy service, which uses a dedicated k8s cluster SA that is WI binded with a GCP SA, this GCP SA is given GCP secret manager permissions to all secrets that are used in the k8s cluster, and these secrets are synced one way into the k8s cluster, and all pods from the cluster are allowed to use any of these secrets as long as they are in the same namespace.
- CSI driver works by using the authentication methods from the pods that need to mount the secrets. For GCP this means the workload identity binded cluster SA on the pods are used for authenticating with GCP secret manager.
- Technically speaking CSI driver is more secure than External Secret Operator as a prowjob pod will only be able to use secrets that the SA is allowed to(we don't have a fine grained separation of different team using different SAs yet, so this is more like future proof)
- Other than security boundaries, one benefit of using CSI driver is that it avoids a GCP SA from Prow service cluster be given GCP secret manager permissions in other projects, and as a result migrating or recovering would be much easier(there will be no IAM changes required from users projects)
- One "downside", is that In terms of authentication it only supports workload identity for GCP, so jobs that are not using workload identity will not be able to use this feature
- The other "downside/WAI", is that pod would failed to start when a secret is not available. This is expected for a Prowjob, and imo is even better than it failed due to using stale secrets that were synced from 7 days ago. For Prow services we will need to make sure that all the kubeconfig secrets are stored in the GCP project where Prow is in, to avoid the case where a user provided kubeconfig secrets being deleted in GCP causing Prow downtime.
With all those being said, I'm convinced that CSI driver is better suited for our use case. Kudos to @jimangel , thank you so much for the discussion, I felt I have learned a lot!
@BenTheElder @spiffxp @dims @ameukam @cjwagner , WDYT?
Awesome write up @chaodaiG! Agreed, it was fun chatting.
One "downside", is that In terms of authentication it only supports workload identity for GCP, so jobs that are not using workload identity will not be able to use this feature
There are alternatives for authentication outlined here: https://github.com/GoogleCloudPlatform/secrets-store-csi-driver-provider-gcp/blob/main/docs/authentication.md but the general consensus is to use WI if at all possible.
@chaodaiG @jimangel Nice! +100
Sounds like a nice improvement to me!
@chaodaiG @jimangel Nice idea!
Let's try it.
@jimangel So if the secret is mounted as a volume in the pod, how this is isolated from the other pods running in the same node ?
So if the secret is mounted as a volume in the pod, how this is isolated from the other pods running in the same node?
I believe the threat model is the same as before (or more secure). Access today is segmented by namespace (k8s "built-in" secrets). With the CSI driver, access is only permitted when all conditions are met:
- A namespace scoped SecretProviderClass (CRD) defining access exists (This directs the mount to the appropriate GCP project / secret).
- GCP IAM bindings to a
Service Account / Workload Identityin GCP to access a specific secret resource(s).
NOTE: Any workload/job/pod in a shared k8s namespace could use the same service account to access the permitted secret(s)/SecretProviderClass. That should be no different than any pod, in the same namespace, accessing the same secret, today.
As far as what access pods on the same node have (isolation)... If any pod/actor can mount/escape a pod to access node-level storage layers; you are as screwed as you'd be if you were using Kubernetes "built-in" secrets. 😅
Let me know if I misunderstood what you're asking @ameukam!
Edit: There are a couple "security considerations" called out in the repo itself.
Hey all! Checking in here, what would be the next steps @chaodaiG? Should we try a small-scale demo or is there somewhere to test?
@cjwagner could you please take a look
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/help-wanted
This is open for contribution if anyone's willing to do so. (We do keep seeing infrequent errors or flakes that require KES to be restarted, so while it's not urgent it'd be helpful!).
@michelle192837 Assuming this need to be deployed on a Google-owned GKE cluster, one action would be to create a SA with workload identity so we can use it to retrieve the secrets from the secret manager. I think it only can be done by EngProd.