sealed-secrets
sealed-secrets copied to clipboard
Add support for running the controller with replicas>1
We need to ensure that the controller doesn't do the wrong thing with replicas>1, since that can happen during transient events like node/partition recovery, mis-configured upgrades, etc. Replicas>1 is a desirable feature for some HA configurations.
Note that this is just about avoiding breakage. Horizontal scalability (active/active or sharding) is not included in this issue.
Note also this could either be via a lock (as is typically done with other controllers), or via careful auditing of atomic operations and race recovery (sealed-secrets-controller only has very simple side-effects).
We are having the operator deployed with one replica in kube-system
and we would like to leverage Cluster Autoscaler to evict pods from nodes. Would you advise allowing downtime for the operator (e.g. by setting minAvailable
to 0 in a PDB), or using more that 1 replicas? We are leaning towards the latter, but not sure if using > 1 replicas is advisable for production workloads. I guess we could live with some seconds downtime of the operator until the new pod is created in another node.
In theory it should just work if you have > 1 replicas, since most of the operation are at least in principle idempotent. However we didn't audit the code yet for race conditions, so you might encounter some bugs.
Having a brief downtime is not usually a big issue, since the unsealing of secrets is an async operation anyway.
Thanks for the feedback! One problem that I can see with downtime in our case is if a deployment takes place (we create all resources through a Helm chart, like Deployment and SealedSecrets), the new pods that will come up may read the old secrets, as the operator won't be able to decrypt the new SealedSecret object at that point.
@gtseres that's an interesting problem. I've been thinking about it too. It's worth noting that even if you don't have controller downtime, there's an inherent race condition there - there's nothing to stop the Deployment controller beating sealed-secrets to rolling our some or all of the new pods before the secrets are updated. I'm not saying that means the downtime is OK - clearly it makes things worse in that situation - but I think it points toward a broader architectural question.
I can't think of a way to be totally confident about this other than a complex script that watches a SealedSecret for events before yielding to the next step of a deployment pipeline.
Tools like https://github.com/pusher/wave can help making sure that when eventually the secret gets updated, any dependent deployment gets updated.
I wonder if sealed-secret should provide such a functionality natively
I'm still in the haven't-actually-tried-it-yet camp unfortunately, but in case this sparks some ideas, I'd love for sealed-secrets to work well with spinnaker, and specifically it's ability to version secrets. https://www.spinnaker.io/reference/artifacts/in-kubernetes-v2/#versioned-kubernetes-objects and https://www.spinnaker.io/reference/providers/kubernetes-v2/#resource-management-policies have more info.
Trying to work out how this could work though. If we teach spinnaker to version sealed secrets in exactly the same way it handles secrets....would that work? We'd define sealed secrets in the normal way (without a version number) and ask spinnaker to deploy it. Spinnaker would append a version number, incrementing it as necessary. So far so good.
The trick is how to deal with binding the sealed secret to e.g. a deployment. The deployment doesn't actually reference the sealed secret...But if sealed-secret did the versioning, so that for sealed secret foo there was secret foo-v001, etc. spinnaker might not need to change at all and things would 'just work.'
Putting a hash of the body of a dependency resource into an annotation of the deployment resource is a common trick some people use with helm or kubecfg. You can do the same trick with sealed secrets.
If I understand you correctly, this is a way to get the pods in a deployment to cycle when the sealed secret changes. Is that it?
Versioning adds another big bonus on top of that -- the ability to roll back the deployment to use the old (sealed) secret in case the new one doesn't work.
Yes, that's what I meant.
If Spinnaker versioning works by adding a suffix to resource names, then you need to use the namespace-wide
annotation since otherwise the sealed secrets controller will refuse to unseal a sealed secret that doesn't match the name it has been sealed with.
Perhaps sealed-secrets is a it too strict to enforce this check by default; I guess that most people set security policies at the namespace level
(indeed resource-level RBAC is still impractical and probably not something that resonates well with the current security model, see https://github.com/kubernetes/kubernetes/issues/56582 and https://github.com/kubernetes/kubernetes/issues/44703)
Maybe that check is still OK? If the sealed secret yaml file that we check in to git has no suffix, but sealed-secret adds one to the kubernetes resource it creates, then the secret and sealed secret will still match...
I assume Spinnaker (or whoever is in charge of doing rollbacks) would manipulate the names of the resources you pass to it, which if you use sealed secrets are the SealedSecrets resources. The secrets are a cluster-side derived resource, like deployment vs pods.
I guess I'll have to learn more about Spinnaker before I can understand your use case.
Kubernetes is in charge of rollbacks. By that point spinnaker is out of the picture. All spinnaker does it take a deployment file that refers to a secret named foo (because I commit my deployment yaml to git that way) and append the suffix to the actual kubernetes resource it deploys.
I'm not sure how spinnaker can know what version of the secret / what suffix to use though. I guess it would need to know that the secret came from a sealed secret and ask sealed-secret somehow?
In the end it may be simpler to leave it all up to spinnaker...
When a field’s referenced type and value match an incoming artifact’s type and name, the field’s value is replaced with the artifact’s reference
Yeah it seems it uses heuristics to determine that e.g. a field in a deployment spec references a secret that's actually a secret reference and thus when it allocates a new version number for the secret resource it will use that new name on the new revision of the deployment.
Then yes, you can do a rollback of a decent lohme twith kubernetes because a previous revision will just point to the previous version of the secret which is still around.
If I understood correctly the scenario, the problem I see here, is that you should be able to tell Spinnaker that a SealedSecret resourced named "foo" should be treated as if it was a Secret called "foo" and whenever a new version of that sealed secret is allocated, that new identifier should be injected in any reference to a secret named "foo".
Yeah, as I think about it more I don't see any way around making spinnaker sealed-secret-aware.
(my personal habit is to managing rollbacks outside of k8s, i.e. literally pushing a previous snapshot of the manifests, usually literally reverting a git commit, with the same tooling I'd use for a roll forward; that said, I think sealed-secrets should be friendly to other workflows, hence I'm all ears)
Yeah, as I think about it more I don't see any way around making spinnaker sealed-secret-aware.
Should Spinnaker be sealed-secret aware? Could you create an issue there and see if they can add native support or suggest a way to configure Spinnaker to deal with this?
Yeah, I've started asking a bit on spinnaker's slack channel. There's also https://github.com/spinnaker/spinnaker/issues/4042.
I need to take the plunge and exercise sealed-secrets with my own two hands so I can talk more intelligently about it, and help make good choices about how to integrate the two.
Thanks for working through this.
btw, if you use the (still experimental and opt-in) key-rotation feature (#137), then having two replicas does get in the way because each will periodically create a new secret and the other instance won't pick it up until the container restarts.
When you have thousands of secrets that need to be decrypted the inability to scale becomes quite a problem.
Has any work been done to run multiple sealed-secret instances, with potential work split? (i.e. task list partitioned by replica)
@drewboswell that's interesting; can you please share your measurements? how many secrets, how much time to converge, which version, etc? (that would help with priorization)
When you have thousands of secrets that need to be decrypted the inability to scale becomes quite a problem. Has any work been done to run multiple sealed-secret instances, with potential work split? (i.e. task list partitioned by replica)
Have you ever found a solution?
We are in the same boat
@gajus depending on what you're doing, you could probably partition sealed-secret controllers / secrets by namespaces, give one controller namespaces a,b,c and another h,i,j: https://github.com/bitnami-labs/sealed-secrets/tree/main#how-to-use-one-controller-for-a-subset-of-namespaces
(We haven't done that, I'm here because I'm about to file a bug asking for a PDB in the helm chart.)
We use Kubernetes cluster to host production and review environments. Every review environment (a mirror of production) is deployed to its own namespace. We sometimes have hundreds of them. When multiple review environments get updated at once it requires re-sealing a number of secrets that belong to them. Things become very slow at that point. I cannot think of how dividing by namespace would work here since each sealed-secrets instance would need to be configured separately.
At the cost of running lots of them, i could imagine using one controller per namespace. Then each controller only has to manage sealing/unsealing for its own namespace. As long as your sealing script knows where the controller is for the namespace it wants to poke, things should work...
Again, I'm not saying this is perfect, just that I suspect it'd be a viable workaround (with some costs as each pod has a cost).
Was about to give this a try, but it looks like Heal chart does not allow to set SEALED_SECRETS_CONTROLLER_NAMESPACE
?
Maybe it is the additionalNamespaces
setting, although name confusingly implies that these are additional namespaces. Will give this a try.
Will need to explore this later, but additionalNamespaces
appears to be not the setting I am after. Haven't yet figured out how to restrict Helm release to certain namespaces.
Looks like you could use the args
get out of jail free card if necessary.
But, looking at https://github.com/bitnami-labs/sealed-secrets/#how-to-use-one-controller-for-a-subset-of-namespaces I suspect just --additional-namespaces=$current_namespace
should work:
https://github.com/bitnami-labs/sealed-secrets/blob/52e2e55db1c2f4ab0820d4837381ef8e460c4c05/pkg/controller/main.go#L229
Made no difference when I tried it today, i.e. it was still unsealing secrets in other namespaces. But will give a second try tomorrow. Today only had sporadic availability to test it; may have made false observations.