cert-controller
cert-controller copied to clipboard
CA and Server certificate potentially get updated before ValidatingWebhookConfiguration
Is it not a problem that the ValidatingWebhookConfiguration and the Secret are updated independently from each other? I think this can lead to the condition where the CA is already renewed but the ValidatingWebhookConfiguration still have the old CA and thus calls to the webhook would fail?
I did not really had problems with this. I only looked into the code and thought that this might become a problem. Or do I miss something?
Yeah this could definitely happen. But the secret's usually only generated if it's missing or when it expires (which is every N years) and so I doubt this is a significant problem. A few seconds of downtime a year (or even a month) is still many 9s of reliability.
Also, I'm not sure how this can be avoided. The webhook configs need the public keys, and the pods need the private keys, and there's no way to update both of those atomically. But most things in K8s can just be retried (yay, declarative APIs!) so rare failures shouldn't have much impact.
Does this sound reasonable?
I'm not sure how this can be avoided.
I think this should be possible by creating a new CA some time ahead before the existing CA expires but keeping the old CA around until it expires. This way you would have two trusted CAs for a certain period of time. I don't have tested if this works with the ValidatingWebhookConfiguration but it definitely should.
I understand that this makes the logic a bit more complex. But especially on big clusters with Gatekeeper where almost every API call also goes to the validation webhook, it's quite likely that one will run into this problem. Even if it is only for a couple of seconds I think it would be nice if this could be avoided completely.
Hmm, I thought the client configs in the webhook configs could only specify one acceptable cert at a time, and the service can only provide one cert at a time. To prevent any downtime at all, I would have thought that the webhook config would have to accept an array of caBundles - is this not correct? Or can a single caBundle include multiple acceptable certs?
On Wed, Oct 14, 2020 at 11:11 AM David Schneider [email protected] wrote:
I'm not sure how this can be avoided.
I think this should be possible by creating a new CA some time ahead before the existing CA expires but keeping the old CA around until it expires. This way you would have two trusted CAs for a certain period of time. I don't have tested if this works with the ValidatingWebhookConfiguration but it definitely should.
I understand that this makes the logic a bit more complex. But especially on big clusters with Gatekeeper where almost every API call also goes to the validation webhook, it's quite likely that one will run into this problem. Even if it is only for a couple of seconds I think it would be nice if this could be avoided completely.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/cert-controller/issues/13#issuecomment-708468354, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43PZEMVBESDHN5SBLBZLLSKW5TDANCNFSM4SQWFTLQ .
Or can a single caBundle include multiple acceptable certs?
That's what I would expect. But haven't tested it.
https://github.com/kubernetes/api/blob/master/admissionregistration/v1/types.go#L526
The name caBundle implies to me that it can contain multiple CAs.
Oh, I assumed it just meant that it included the root and intermediate certs as well, as per https://www.namecheap.com/support/knowledgebase/article.aspx/986/69/what-is-ca-bundle .
But if you test it and discover that it can contain multiple leaf certs then we could consider this.
On Wed, Oct 14, 2020 at 11:43 AM David Schneider [email protected] wrote:
https://github.com/kubernetes/api/blob/master/admissionregistration/v1/types.go#L526
The name caBundle implies to me that it can contain multiple CAs.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/cert-controller/issues/13#issuecomment-708488575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43PZAXKVBEAHKFAF3XL23SKXBKBANCNFSM4SQWFTLQ .
+1 to everything in this discussion.
I was also under the impression that only one root cert could be provided at any given time. If it's possible to just create a super bundle by concatenating the certs, zero-downtime rotation would be ideal.
Looking deeper into it, concatenation may work.
Note that the CA certs are self-signed, so there should be no intermediate certs.
We could test this by:
- Launching a copy of HNC or Gatekeeper
- Saving the secret contents
- Deleting the secret
- Saving the contents of the newly created secret.
- Concatenating the two CA PEMs (they will likely need to be base64 decoded from the secret before we can concatenate)
- Setting up two HNC /GK pods, one mounted with one copy of each secret
- Creating a VWH config with the concatenated CA bundle
- Point the VWH at one pod, see if it fails, point it at the second pod, see if it fails
Not a trivial amount of work, but worthwhile. I think the biggest benefit here is that each pod could maintain its own CA/key pair. This would avoid the need for all cert controllers across all versions to agree on their validity tests / generated cert contents and would remove cert generation as a SPOF.
We are already seeing a split WRT generated cert contents from #9
I tested this and it does work. I created two CA certificates (ca1 and ca2). Then I created one certificate (tls) and signed it with ca1 and another (tls-new) and signed it with ca2. Then I started the a webhook with the tls certificate and then switched to the tls-new certificate and both did work. My validating webhook configuraiton does look like this:
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
name: validating-webhook-configuration
webhooks:
- clientConfig:
caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJmakNDQVNTZ0F3SUJBZ0lSQVBtSytySFJmZU9kWXpEenFzRnVjbjR3Q2dZSUtvWkl6ajBFQXdJd0RqRU0KTUFvR0ExVUVBeE1EWTJFeE1CNFhEVEl3TVRBeE5ERTJORGd3TjFvWERUSXhNVEF4TkRFMk5EZ3dOMW93RGpFTQpNQW9HQTFVRUF4TURZMkV4TUZrd0V3WUhLb1pJemowQ0FRWUlLb1pJemowREFRY0RRZ0FFYlFHaTN5YWN1Zml6Ck9LMmErLzVZWDZSMWMxa1ZJMklvLzg2eisvVmFnWWZHRDlxZUJmR3BQd3pvQWphWTQzRC9tQUs2dHFtM2llbTEKejdYOEZaRzMyYU5qTUdFd0RnWURWUjBQQVFIL0JBUURBZ0VHTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRApWUjBPQkJZRUZOQVlrY0lBS3A0aHkzQzVMaVV3UURSNUk4VUtNQjhHQTFVZEl3UVlNQmFBRk5BWWtjSUFLcDRoCnkzQzVMaVV3UURSNUk4VUtNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJUURuUVFtUmxkbHB0TDdTdmZjeHQxSFMKa2RMeW1GTlhhNkNYcFh2bHJsdUgwZ0lnYjBLN2JuYWI0Y2pyMUhHY2dDbWc2QTQ2Y3V3SFpZV3hWVjJkMTlacwp1bkU9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0KLS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJmVENDQVNTZ0F3SUJBZ0lSQUpOZmVlUTJSYThRazIyM2dkNTNlaDh3Q2dZSUtvWkl6ajBFQXdJd0RqRU0KTUFvR0ExVUVBeE1EWTJFeU1CNFhEVEl3TVRBeE5ERTJORGd4TmxvWERUSXhNVEF4TkRFMk5EZ3hObG93RGpFTQpNQW9HQTFVRUF4TURZMkV5TUZrd0V3WUhLb1pJemowQ0FRWUlLb1pJemowREFRY0RRZ0FFQkFkOXpzU1FzVm4wCnFva21uZ054SnpGSjhEZDFRSC9uaWJ6V2UybkNmTzYyRmFwNFEzRWdDTjhHNTk1SlNNdTNLQ29vK2ZIZVlnVk4KK1J1d3Y4b21PcU5qTUdFd0RnWURWUjBQQVFIL0JBUURBZ0VHTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRApWUjBPQkJZRUZNaEpqQzJXeWwyZnRZNWpLWHZKejdSdlBhY3JNQjhHQTFVZEl3UVlNQmFBRk1oSmpDMld5bDJmCnRZNWpLWHZKejdSdlBhY3JNQW9HQ0NxR1NNNDlCQU1DQTBjQU1FUUNJRmp3QVdYcC9DdjZhdjY3M0FuS2p6dFQKbmxWcjlubTVMZ1NrWXBYUVZ6TEdBaUEzRndXY0hTVUNES2FiTnN6cW1KZmVRN21ucVR4NDFqOUQ5ZXg0MVRUNQordz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
url: https://192.168.39.1:8443/validate
failurePolicy: Fail
name: vcm.kb.io
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- configmaps
Neat! Are you able to contribute a patch?
On Wed, Oct 14, 2020 at 3:27 PM David Schneider [email protected] wrote:
I tested this and it does work. I created two CA certificates (ca1 and ca2). Then I created one certificate (tls) and signed it with ca1 and another (tls-new) and signed it with ca2. Then I started the a webhook with the tls certificate and then switched to the tls-new certificate and both did work. My validating webhook configuraiton does look like this:
---apiVersion: admissionregistration.k8s.io/v1beta1kind: ValidatingWebhookConfigurationmetadata: name: validating-webhook-configurationwebhooks:
- clientConfig: caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJmakNDQVNTZ0F3SUJBZ0lSQVBtSytySFJmZU9kWXpEenFzRnVjbjR3Q2dZSUtvWkl6ajBFQXdJd0RqRU0KTUFvR0ExVUVBeE1EWTJFeE1CNFhEVEl3TVRBeE5ERTJORGd3TjFvWERUSXhNVEF4TkRFMk5EZ3dOMW93RGpFTQpNQW9HQTFVRUF4TURZMkV4TUZrd0V3WUhLb1pJemowQ0FRWUlLb1pJemowREFRY0RRZ0FFYlFHaTN5YWN1Zml6Ck9LMmErLzVZWDZSMWMxa1ZJMklvLzg2eisvVmFnWWZHRDlxZUJmR3BQd3pvQWphWTQzRC9tQUs2dHFtM2llbTEKejdYOEZaRzMyYU5qTUdFd0RnWURWUjBQQVFIL0JBUURBZ0VHTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRApWUjBPQkJZRUZOQVlrY0lBS3A0aHkzQzVMaVV3UURSNUk4VUtNQjhHQTFVZEl3UVlNQmFBRk5BWWtjSUFLcDRoCnkzQzVMaVV3UURSNUk4VUtNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJUURuUVFtUmxkbHB0TDdTdmZjeHQxSFMKa2RMeW1GTlhhNkNYcFh2bHJsdUgwZ0lnYjBLN2JuYWI0Y2pyMUhHY2dDbWc2QTQ2Y3V3SFpZV3hWVjJkMTlacwp1bkU9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0KLS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJmVENDQVNTZ0F3SUJBZ0lSQUpOZmVlUTJSYThRazIyM2dkNTNlaDh3Q2dZSUtvWkl6ajBFQXdJd0RqRU0KTUFvR0ExVUVBeE1EWTJFeU1CNFhEVEl3TVRBeE5ERTJORGd4TmxvWERUSXhNVEF4TkRFMk5EZ3hObG93RGpFTQpNQW9HQTFVRUF4TURZMkV5TUZrd0V3WUhLb1pJemowQ0FRWUlLb1pJemowREFRY0RRZ0FFQkFkOXpzU1FzVm4wCnFva21uZ054SnpGSjhEZDFRSC9uaWJ6V2UybkNmTzYyRmFwNFEzRWdDTjhHNTk1SlNNdTNLQ29vK2ZIZVlnVk4KK1J1d3Y4b21PcU5qTUdFd0RnWURWUjBQQVFIL0JBUURBZ0VHTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRApWUjBPQkJZRUZNaEpqQzJXeWwyZnRZNWpLWHZKejdSdlBhY3JNQjhHQTFVZEl3UVlNQmFBRk1oSmpDMld5bDJmCnRZNWpLWHZKejdSdlBhY3JNQW9HQ0NxR1NNNDlCQU1DQTBjQU1FUUNJRmp3QVdYcC9DdjZhdjY3M0FuS2p6dFQKbmxWcjlubTVMZ1NrWXBYUVZ6TEdBaUEzRndXY0hTVUNES2FiTnN6cW1KZmVRN21ucVR4NDFqOUQ5ZXg0MVRUNQordz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K url: https://192.168.39.1:8443/validate failurePolicy: Fail name: vcm.kb.io rules:
- apiGroups:
- "" apiVersions:
- v1 operations:
- CREATE
- UPDATE resources:
- configmaps
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/cert-controller/issues/13#issuecomment-708612641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43PZDLUOLQ5F3DPMSROXDSKX3SBANCNFSM4SQWFTLQ .
Currently I don't have time to contribute a patch.
Ok, we'll leave this open in case someone has the chance to come fix it!
On Sun, Oct 18, 2020 at 3:20 PM David Schneider [email protected] wrote:
Currently I don't have time to contribute a patch.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/cert-controller/issues/13#issuecomment-711398971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE43PZEANJ6BIIM36WZKN53SLM5YJANCNFSM4SQWFTLQ .
EDIT: I was wrong, it totally works!
I tried to test this myself.
-
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io -ojson my-webhook | jq .webhooks[0].clientConfig.caBundle -r | base64 -d > /tmp/old-ca.pem
- Rotate the certificate, updating the secret and webhook
-
kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io -ojson my-webhook | jq .webhooks[0].clientConfig.caBundle -r | base64 -d > /tmp/new-ca.pem
-
cat /tmp/old-ca.pem /tmp/new-ca.pem | base64 -w0
- Edit the
ValidatingWebhoookConfiguration
to use the concatenated result.
I now get:
tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "x509: invalid signature: parent certificate cannot sign this kind of certificate" while trying to verify candidate authority certificate
This lines up with @adrianludwin's initial hunch that caBundle
is a certificate chain rather than a list of separate leaf certificates.
@dvob do you have any more details on how you got it to work?
Oops! I'm not sure what I was doing wrong yesterday, but I tested again today. Appending the old and new CA's in the webhook CA bundle seems to work. Requests to a webhook using either the old or the new certificate work!