security icon indicating copy to clipboard operation
security copied to clipboard

[BUG] Exception when hot-reloading Let's Encrypt certs issued by alternate intermediate CAs

Open perios101 opened this issue 1 year ago • 13 comments

What is the bug? As part of the process using the REST API to hot reload transport and http certificates, we receive a 500 error.

The old certificate (and all previous ones to date) has been issued by Let's Encrypt's R3 CA (which seems to have been alive and well as DR CA) and the new one by R11 CA. Same scenario is happening on an environment where intermediate CA changed from R11 to R10. Both are covered by parent ISRG Root X1 CA (along with the old R3).

Restarted individual pods automatically pick up new certificate as valid without issue however hot-reloading existing instances throws the same error over and over despite both CAs are Let's Encrypt issuers covered by root CA.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Configure and use OpenSearch using Let's Encrypt certificate
  2. Renew certificate (using different intermediate CA to the original one
  3. Hot-reload certificates using REST API
  4. See error

What is the expected behavior? Certificate validation should succeed as both certificates were issued via valid CA - pinning to same CA as the old one is not considered good practice, especially in DR scenarios (yes, Issuers have those too).

What is your host/environment?

  • OS: Amazon Linux (containerised for Kubernetes)
  • Version: OpenSearch 2.14.1
  • Plugins:

Do you have any screenshots? Exception: java.io.IOException: OpenSearchSecurityException[Error while initializing transport SSL layer from PEM: java.lang.Exception: New Certs do not have valid Issuer DN, Subject DN or SAN.]; nested: Exception[New Certs do not have valid Issuer DN, Subject DN or SAN.];

Do you have any additional context?

  • From digging into the Security plugin code, it appears logic HERE is the culprit.
  • We have the same issue across three different environments (affects also OpenSearch Dashboards - using same security plugin).
  • Workaround (restarting pods) exists but it's not scalable / fit as a long-term solution.

perios101 avatar Jul 01 '24 16:07 perios101

Thanks for filing the issue, I see a previous issue https://github.com/opensearch-project/security/issues/2360 where this was reported and being worked on. @shubhmkr-amazon are you still working on this one?

shikharj05 avatar Jul 02 '24 07:07 shikharj05

Hi @shikharj05, Yes, I am still working on this.

shubhmkr-amazon avatar Jul 02 '24 09:07 shubhmkr-amazon

[Triage] HI @perios101 thank you for filing this issue. Looks like a similar bug is still being worked on. I am going to assign this to @willyborankin since he has expressed interest in taking a look at this in the time being.

stephen-crawford avatar Jul 08 '24 15:07 stephen-crawford

We encountered the same issue after Let's Encrypt's replaced its intermediate certs to R10/ R11. We still have the same workaround method: restart all nodes to pick up the new cert, which seems unscalable for a large number of nodes and could impact clusters without proper replication configuration.

tan3-netapp avatar Aug 19 '24 04:08 tan3-netapp

Hi @shubhmkr-amazon and @willyborankin, Could I get to know your progress on this issue? I could give you a hand if necessary.

tan3-netapp avatar Aug 20 '24 04:08 tan3-netapp

Hi @tan3-netapp, Earlier, I made the changes but got stuck in the unit tests due to self-signed certificates used in testing. I started working on this PR again and will try to raise it by August 29th. In case @willyborankin has made some progress, he can take the lead on this PR.

Thanks

shubhmkr-amazon avatar Aug 20 '24 07:08 shubhmkr-amazon

Hi @tan3-netapp, the issue has been resolved in this PR: https://github.com/opensearch-project/security/pull/4624. It has been merged into both the main and 2.x branches and will be included in the 2.17 release.

willyborankin avatar Aug 20 '24 14:08 willyborankin

Hi @willyborankin, it looks like PR: https://github.com/opensearch-project/security/pull/4624 is solving a different problem. In the mentioned issue, the error is coming because of a change in the intermediate CA. Here, during the hot-reload, the issuer DN of the in-memory certificate will fail to match the new certificate in case the ICA has changed.

shubhmkr-amazon avatar Aug 20 '24 15:08 shubhmkr-amazon

Hi @shubhmkr-amazon, I would appreciate it if I could have a look at your working PR, just out of curiosity.

tan3-netapp avatar Aug 21 '24 03:08 tan3-netapp

Hi @shubhmkr-amazon / @willyborankin , are either of you still working on a potential fix for this issue?

It is still causing us a fair bit of pain, so I was thinking of adding two new bool configs that could be set to false to skip the DN verification (https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/ssl/DefaultSecurityKeyStore.java#L656), for example:

  • plugins.security.ssl.transport.enforce_cert_reload_dn_verification
  • plugins.security.ssl.http.enforce_cert_reload_dn_verification

How does that sound as a potential approach?

parislarkins avatar Sep 06 '24 02:09 parislarkins

Hi @parislarkins, the approach you suggested of bypassing the issuerDN check is not ideal. In this case, a self-signed certificate can also be renewed, which is not trusted. I am working on a fix that will perform issuer DN validation by building a chain of trust with an in-memory trust store(However, there will be one limitation with this approach: what if the root certificate itself gets rotated?).

shubhmkr-amazon avatar Sep 09 '24 14:09 shubhmkr-amazon

Hi @shubhmkr-amazon, thanks for getting back to me!

While the point you make about self-signed certificates is true, I'm unsure why this is a particular concern given that there is no such validation when OpenSearch is restarted.

My understanding is that rotating certs (either by a full restart or hot-reload) requires updating the certs on disk and then being a super admin or having the ability to restart OpenSearch. If the concern is that a malicious actor could trigger a hot-reload to change to a self-signed cert, then I think it likely that they would have the ability to restart OpenSearch as well, given they must have access to the OpenSearch server in order to update the cert on disk.

Would you be able to clarify what the concern is with being able to hot-reload to a potentially self-signed cert (on an opt-in basis), given this is allowed by restarting OpenSearch?

Thanks!

parislarkins avatar Sep 10 '24 00:09 parislarkins

I've raised https://github.com/opensearch-project/security/pull/4752 as a potential fix for this issue, to allow disabling the Distinguished Names validation when performing a hot-reload. In our case we do not think the validation is providing us additional security value, so are happy to disable it. There may be other use cases where this is not appropriate though, so the fix suggested by @shubhmkr-amazon may be the best option there, even if it doesn't work for root CA rotation.

parislarkins avatar Sep 24 '24 03:09 parislarkins

@cwperks this bug needs reopening since the PR above was reverted in https://github.com/opensearch-project/security/pull/4840

perios101 avatar Oct 28 '24 08:10 perios101

@perios101 it has now been re-merged by #4841

parislarkins avatar Oct 29 '24 02:10 parislarkins