bottlerocket-update-operator Potentially odd TLS certificate setup in v0.2.2

Image I'm using: public.ecr.aws/bottlerocket/bottlerocket-update-operator:v0.2.2

Issue or Feature Request: The way the TLS certificate is configured with the APIServer and agents seems unusual to me. However this maybe a misunderstanding or lack of knowledge on my part.

The bottlerocket-update-operator.yaml deployment manifests creates a Certificate resource to be self-signed by cert-manager, this certificate is a CA but ALSO has DNS names. Generally speaking CA certs don't have DNS names and are only used to verify and certify child certificates. This has the advantage that the CA private key can be held separate from applications relying on certificates signed by the CA and if one of them is compromised the attacker cannot sign new certificates. Having the CA separate from the client certs also makes rotation easier as you can give a longer life to the CA and a short life to client cert and then rotate the client cert without needing to get all users of the CA to reload the CA in tandem.

As far as I can tell the Bottlerocket updater agent only uses the TLS certificate to verify the certificate of the APIServer and ignores the extra ca.crt file injected in by cert-manager. This means the Client cert always has to be the root CA otherwise communication between the agent and APIServer fails with: error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1883: (unable to get local issuer certificate)'''"}. Also the APIServer and the Agent don't reload the certificate if it changes, as a result when cert-manager rotates the certificate brupop-apiserver-certificate (which by default will be after 60 days) a cluster admin would need to bounce all the agent and APIServer pods otherwise communication will fail, including with the Kubernetes API.

I think ideally you would create a longer lived (say 10 years) CA without any DNS Names. Then sign a dedicated cert for the APIServer with a short TTL of maybe a day. The APIServer would automatically reload the cert if it changed (or have a sidecar to trigger a SIGUP). The agent could then use the CA to verify the connection to the APIServer and wouldn't need to worry about the client certificate rotating (for a while).

Also, in bottlerocket-update-operator.yaml there also appears to be an unused Issuer called my-ca-issuer. But there might be a reason for this?

Aug 09 '22 10:08 sedan07

Hi @sedan07,

Thanks for reaching out. Happy to hear that you are using our new release version. I will try to best to reply to your concerns.

As far as I can tell the Bottlerocket updater agent only uses the TLS certificate to verify the certificate of the APIServer and ignores the extra ca.crt file injected in by cert-manager. This means the Client cert always has to be the root CA otherwise communication between the agent and APIServer fails with: error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1883: (unable to get local issuer certificate)'''"}.

Yes, the client certificate is always the root certificate. We use reqwest client builder, the api it provided to us to connect to a server with self-signed certificate is add_root_certificate, which requires to provide the root certificate. By checking the certificate using openssl x509 -in $certificate_file -text -noout shows me the same Issuer and Subject also proof that this is the root certificate :

 Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=my-selfsigned-ca
        Validity
            Not Before: Aug  6 00:03:59 2022 GMT
            Not After : Nov  4 00:03:59 2022 GMT
        Subject: CN=my-selfsigned-ca

As for the extra ca.crt, the content in ca.crt it is actually the same as the contents in tls.crt (this is the generate by cert-manager), so I picked tls.crt as the set_certificate_chain_file.

Also the APIServer and the Agent don't reload the certificate if it changes, as a result when cert-manager rotates the certificate brupop-apiserver-certificate (which by default will be after 60 days) a cluster admin would need to bounce all the agent and APIServer pods otherwise communication will fail, including with the Kubernetes API.

This might be the part that I was missing, I have opened an issue to track this part: #233

The bottlerocket-update-operator.yaml deployment manifests creates a Certificate resource to be self-signed by cert-manager, this certificate is a CA but ALSO has DNS names. Generally speaking CA certs don't have DNS names and are only used to verify and certify child certificates. ... I think ideally you would create a longer lived (say 10 years) CA without any DNS Names.

The issue I was facing without providing DNS names was hostname mismatch

 'error sending request for url (https://brupop-apiserver.brupop-bottlerocket-aws.svc.cluster.local/bottlerocket-node-resource): error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1883: (hostname mismatch)'''"}

Also, in bottlerocket-update-operator.yaml there also appears to be an unused Issuer called my-ca-issuer. But there might be a reason for this?

Yes, this follows the cert-manager's doc to bootstrapping ca issuer. Using self-signed certificate without bootstrapping works only for openssl 1.1.1 but not for 3.0.x, I think it might due to x509 changes in 3.0.x.

This is the first time I work with TLS certificate related stuff. Let me know any of the above part didn't follow the general industry standard implementation.

Aug 09 '22 19:08 somnusfish

Hi @somnusfish,

Firstly thanks for your contributions to the update operator, it's a valuable tool for us and it has successfully rolled out a number of upgrades to our clusters (including prod) :-).

Yes, the client certificate is always the root certificate. We use reqwest client builder, the api it provided to us to connect to a server with self-signed certificate is add_root_certificate, which requires to provide the root certificate

As for the extra ca.crt, the content in ca.crt it is actually the same as the contents in tls.crt (this is the generate by cert-manager), so I picked tls.crt as the set_certificate_chain_file.

I would suggest it might make more sense to switch to using the ca.crt file with the reqwest add_root_certificate method. Firstly this would enable the use of a separate/dedicated root CA and non-CA leaf certificate for the APIServer, but would still work with a selfsigned certificate as in the existing setup, because as you rightly point out the ca.crt and tls.crt files are the same for selfsigned certs. Secondly it would help with certificate rotation because the APIServer's certificate could have a short lifetime without needing all the agents to reload the certificate as soon as it rotates.

For the APIServer it might also be worth updating the SSLAcceptor code to use the ca.crt file for the call to set_certificate_chain_file and then calling the set_certificate method with the tls.crt file. This way clients connecting to the APIServer will be served both tls.crt and ca.crt (unless self-signed), the reason this is important is for longer certificate chains:

For instance Public CA's typically create a long term root (self-signed) certificate and then sign a few active/operational intermediate/subordinate certificates used to sign requests from customers. They can lock their root CA key away as they don't need it for day-to-day operations, that job is delegated to the intermediate(s)/subordinate(s). However a users OS or Browser doesn't necessarily know about all these intermediates but they know about the root ones. So when connecting to a web server over HTTPS the server sends any intermediates and their client certificate then the browser can tie up the whole chain following up from the client certificate to any intermediates and finally to a root. If it trusts the root then all good otherwise it rejects it.

Root CA -> Intermediate CA -> Web Server Certificate (Not a CA)

Real example, certificate served up for https://www.amazontrust.com openssl s_client -connect www.amazontrust.com:443:

Amazon Root CA 1 -> Amazon -> www.amazontrust.com

This is probably overkill for securing brupop APIServer communication for most users. But switching those parameters would enable it to be deployed into an environment with stricter security controls where maybe cert-manager wasn't used and instead a HSM (Hardware Security Module) backed internal certificate authority had to be used.

Aug 10 '22 14:08 sedan07

Generally speaking CA certs don't have DNS names and are only used to verify and certify child certificates

We don't have this quite right yet: we have successfully defined an agreement between the API servers https client and the server itself on what the trusted CA is.

But we should consider generating different certs within the chain that the agent and server can use individually to complete the chain of trust.

For now, this isn't too big a deal since a bad actor would need to gain cluster admin like privileges across namespaces to successfully get the CA but it is weird we use the CA as both root CA and server cert.

Re-opening so we can continue to harden this.

Nov 02 '22 00:11 jpmcb

bottlerocket-update-operator bottlerocket-update-operator copied to clipboard

Potentially odd TLS certificate setup in v0.2.2

bottlerocket-update-operator
bottlerocket-update-operator copied to clipboard