envoy
envoy copied to clipboard
SNI-based cert selection during TLS handshake
Envoy support selecting certs by selecting filter chain based on SNI. But it is possible that we access different services via one filter chain, which requires SNI-based cert selection in one single filter chain during handshake.
Risk Level: Low Testing: unit tests Docs Changes: ssl Release Notes: yes Fixes #21739
Signed-off-by: Luyao Zhong [email protected]
@ggreenway Do we need to add release notes for this?
I've been busy and haven't had time to review this yet; thanks for your patience.
I've been busy and haven't had time to review this yet; thanks for your patience.
It's ok. :) Thanks for your response.
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).
/retest
Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit
@ggreenway would you be able to take a look today?
- CN is not used if SANs are present
This requires adding public method to access internal field server_names_map_ in SslContextImpl and add interface to its base class. It might not be worth o add a method just for testing.
- Wildcard in SANs is matched properly
SslSocketTest, CertWildcardSANWithSni
- Multiple certs with multiple SANs; matching works correctly
- Wildcard does not match if an exact match is present
SslSocketTest, MultiCertMultipleSANWithSni
- Wildcard only matches 1 level, eg *.example.com does not match a.b.example.com
We will fallback to first cert even there is no match, then it depends on how peer validates this cert. I have not dig the validation criteria, but it seems another separate feature "sni-based validation" if not supported in envoy
- Config fails to load with conflicting SANs in certs, both exact and wildcard
SslContextImplTest, AtMostOneRsaCert1
SslContextImplTest, AtMostOneRsaCert2
SslContextImplTest, AtMostOneEcdsaCert
- CN is not used if SANs are present
This requires adding public method to access internal field server_names_map_ in SslContextImpl and add interface to its base class. It might not be worth o add a method just for testing.
You can test it by looking for the behavior: set a SAN of a.example.com and a CN of b.example.com, see what cert is selected when you request b.example.com. Alternatively, create two certs, one with SAN a.example.com and CN b.example.com, and another cert with SAN b.example.com. Validate that it does NOT fail to load the config.
- Wildcard in SANs is matched properly
SslSocketTest, CertWildcardSANWithSni
In many of the matching tests, please add more certificates, and ensure the first/default one won't match. Otherwise the test would pass without any of the new matching logic because the first/default cert is presented when there is no other match.
- Multiple certs with multiple SANs; matching works correctly
- Wildcard does not match if an exact match is present
SslSocketTest, MultiCertMultipleSANWithSni
- Wildcard only matches 1 level, eg *.example.com does not match a.b.example.com
We will fallback to first cert even there is no match, then it depends on how peer validates this cert. I have not dig the validation criteria, but it seems another separate feature "sni-based validation" if not supported in envoy
No, I'm asking for testing of which certificate envoy selects to present to the client. Make sure that *.example.com is NOT used for an SNI value of a.b.example.com. Validate that the default/first cert is used in that case.
- Config fails to load with conflicting SANs in certs, both exact and wildcard
SslContextImplTest, AtMostOneRsaCert1 SslContextImplTest, AtMostOneRsaCert2 SslContextImplTest, AtMostOneEcdsaCert
What happens if the client is not EC-capable, and there is a matching SNI for an EC cert, but not RSA? Please add a test for this case (I think the correct behavior is to return the first/default cert).
- CN is not used if SANs are present
You can test it by looking for the behavior: set a SAN of a.example.com and a CN of b.example.com, see what cert is selected when you request b.example.com. Alternatively, create two certs, one with SAN a.example.com and CN b.example.com, and another cert with SAN b.example.com. Validate that it does NOT fail to load the config.
done
In many of the matching tests, please add more certificates, and ensure the first/default one won't match. Otherwise the test would pass without any of the new matching logic because the first/default cert is presented when there is no other match.
done
No, I'm asking for testing of which certificate envoy selects to present to the client. Make sure that *.example.com is NOT used for an SNI value of a.b.example.com. Validate that the default/first cert is used in that case.
done. The validation succeeds with default cert since Envoy does not define the criteria that how to validate cert SAN based on SNI .
What happens if the client is not EC-capable, and there is a matching SNI for an EC cert, but not RSA? Please add a test for this case (I think the correct behavior is to return the first/default cert).
it still uses this EC cert, since there is only EC(default cert) for this SNI in this case. I met error in testing, need to dig. :(. I think previous selection criteria also has the case that a EC-cert is selected for a not-EC-capable client, do you know any existing tests for that?
I think previous selection criteria also has the case that a EC-cert is selected for a not-EC-capable client, do you know any existing tests for that?
I think, but am not certain, that with the old selection algorithm, if both an ec and rsa cert were configured, and the client was not ec-capable, that the rsa cert would always be selected. It was a much simpler algorithm, because there were only 2 certs. But maybe there was a case involving bad ocsp stapling that the ec cert could be used? Is that the case you're thinking of?
But maybe there was a case involving bad ocsp stapling that the ec cert could be used? Is that the case you're thinking of?
Yeah I think there was case where given two certs that the client could consume, we would skip the normally preferred one if it would have resulted in immediately rejecting the connection because of OCSP config and a stale/bad staple to give the other cert a chance to succeed. It’s definitely a corner case.
I think previous selection criteria also has the case that a EC-cert is selected for a not-EC-capable client, do you know any existing tests for that?
I think, but am not certain, that with the old selection algorithm, if both an ec and rsa cert were configured, and the client was not ec-capable, that the rsa cert would always be selected. It was a much simpler algorithm, because there were only 2 certs. But maybe there was a case involving bad ocsp stapling that the ec cert could be used? Is that the case you're thinking of?
With the old selection algorithm, it was possible that the client was not ec-capable but only ec cert configured. So what I meant is that for old selection algorithm, there seems no tests for the case that you mentioned: EC-cert is selected for a not-EC-capable client, thus I was asking is there any tests I missed.
I add a test CertWithNotECCapable for this, to test verification fail since client is not EC-capable, while the stats seems not set properly I left a TODO comment there.
@ggreenway kindly ping.
@LuyaoZhong How does this relate to TLS bumping?
Here you are enhancing the TLS transport socket capability to be able to select a certificate based on SNI. But it's still a pre-configured set of certs serving a pre-known set of SNIs. On the other hand, if I understand correctly, the TLS bumping is trying to solve the case where the server domains are not known in advance and the new bumping filter work with cert provider to dynamically produce a mimic certificate for any SNI.
What am I missing? Please help fill the gap in my knowledge.
@LuyaoZhong How does this relate to TLS bumping?
Here you are enhancing the TLS transport socket capability to be able to select a certificate based on SNI. But it's still a pre-configured set of certs serving a pre-known set of SNIs. On the other hand, if I understand correctly, the TLS bumping is trying to solve the case where the server domains are not known in advance and the new bumping filter work with cert provider to dynamically produce a mimic certificate for any SNI.
What am I missing? Please help fill the gap in my knowledge.
@aejeet This feature just modifies the cert selection criteria during handshake, It does NOT require certs have to be pre-configured. In TLS bumping, new bumping filter works with cert provider to dynamically generate cert, all mimic certs will be updated to downstream transport socket, then we need to select a mimic cert for downstream based on SNI.
@ggreenway there are 6 commits incorrectly signed off , because I use the github "sign-off and commit suggestion" in a wrong way. :(
I need to force push, will do that after next round review.
Do I need to rebase the branch according to https://github.com/envoyproxy/envoy/pull/22036/checks?check_run_id=8571102157 to fix DCO failure?
Yes, unfortunately you will need to rebase to fix DCO. I suggest waiting until the PR is approved to do it, as rebase causes GitHub to loose comment anchors.
Note this format error fix, and the merge conflict needs to be resolved with a fresh merge of main.
Note this format error fix, and the merge conflict needs to be resolved with a fresh merge of main.
https://dev.azure.com/cncf/envoy/_build/results?buildId=118779&view=logs&j=55bc2ab9-36c1-5ab8-58a6-8c258728eeff&t=33b57bf9-ef13-58c8-bc2b-f804aeb16d39&l=110
/wait
Note this format error fix, and the merge conflict needs to be resolved with a fresh merge of main.
https://dev.azure.com/cncf/envoy/_build/results?buildId=118779&view=logs&j=55bc2ab9-36c1-5ab8-58a6-8c258728eeff&t=33b57bf9-ef13-58c8-bc2b-f804aeb16d39&l=110
/wait
@wrowe Thanks. Done
Apologies; I've had a combination of family issues and sickness. I plan to review this next week.
/wait-any
@ggreenway could you take a look again, any other comments?
Retrying Azure Pipelines: Check envoy-presubmit didn't fail.
/retest
Retrying Azure Pipelines: Check envoy-presubmit isn't fully completed, but will still attempt retrying. Retried failed jobs in: envoy-presubmit
/retest
Retrying Azure Pipelines: Retried failed jobs in: envoy-presubmit