management-api-for-apache-cassandra icon indicating copy to clipboard operation
management-api-for-apache-cassandra copied to clipboard

SSL Context Hot Reloading doesn't work as expected

Open lincolnlog4 opened this issue 1 year ago • 5 comments

SSLContext reload does not seem to be updated when cert files are changed. It seems that even though a log message is displayed that changes have been detected in the files, the actual SSLContext for the server remains unchanged.

Steps to reproduce.

  1. Create a short lived cert that will expire in a few minutes to use as a cert for the server
  2. Update cert to a longer lived cert
  3. Verify the Detected change in the SSL/TLS certificates, reloading log occurs
  4. Execute a local curl command e.g. curl --cert tls.crt --key tls.key --cacert ca.crt https://localhost:8080/api/v0/lifecycle/pid with updated certs
  5. Output of above command will be SSL certificate problem: certificate has expired even though the certs should have been updated to the long-lived cert

Restarting the management api process (and therefore restarting Cassandra) will result in the process picking up the new cert, however, it would be great if the restart was not necessary

lincolnlog4 avatar Jun 24 '24 19:06 lincolnlog4

I think the io.grpc.util.AdvancedTlsX509KeyManager could also be used and built directly into the SslContext rather than the manual watcher that is set up now. Something like:

AdvancedTlsX509KeyManager keyManager = new AdvancedTlsX509KeyManager();
keyManager.updateIdentityCredentialsFromFile(tlsCert, tlsKey, 10, MINUTES, executor);
SslContextBuilder.forServer(keyManager)
            .trustManager(tlsCaCert)
            .clientAuth(ClientAuth.REQUIRE)
            .ciphers(null, IdentityCipherSuiteFilter.INSTANCE)
            .build();

lincolnlog4 avatar Jun 26 '24 12:06 lincolnlog4

I see @burmanm implemented the original hot reloading, any thoughts?

lincolnlog4 avatar Jun 26 '24 13:06 lincolnlog4

Hey, I haven't had time to investigate this yet. I think the message "SSL certificate problem: certificate has expired" is displayed by the curl even if the server would have correct certificates - if the client certificates have expired (or the image's system certificates), so alone it doesn't tell us much sadly which certificates are old.

The gRPC's same code wouldn't make any difference, because it would simply end up with the same situation (it's essentially doing the same thing). If there's a bug, it would be in not using the new SSLContext, since clearly it would still somehow load the old ones. Also, the construction of the SSLContext should be correct since you got it to work the first time as it's the same code. Thus, the possible bug would have to be in the Netty pipeline using the old SSLContext object. This is something that could be logged and verified though.

burmanm avatar Jun 27 '24 06:06 burmanm

In regards to the curl command, the provided client certs were not expired as they were updated to the long-lived certs and this was verified using openssl. The server reports an error in the SSLHandshake (no mention of expired certs) as the client connection initiates a close of the connection when it has determined the server cert is expired.

I believe the gRPC code and the existing logic actually have slightly different behaviors. The gRPC code woud use the KeyManager of the original SSLContext to monitor/update the server's certificates so that when they are updated it will be reflected in the original SSLContext that was used to start the server. The existing hot reload code attempts to create a new SSLContext and replace the existing one, but it is not possible to set a new SSLContext unless the server is restarted.

Here's a relevant stack overflow for reloading SSLContexts: https://stackoverflow.com/questions/51411594/reload-netty-servers-ssl-context-for-grpc

lincolnlog4 avatar Jun 28 '24 14:06 lincolnlog4

I don't think that's needed in this case, it seems overkill. Since we use initChannel, it should reload on every new connection. Thus, it seems the only bug is that the context to the SSLContext was incorrect one and we need to keep updating it instead of simply refreshing the SSLContext.

burmanm avatar Jul 02 '24 12:07 burmanm