envoy
envoy copied to clipboard
TLS bumping: decrypting communications between internal and external services
TLS Bumping in Envoy Design Doc
2022.10.31
PoC: https://github.com/envoyproxy/envoy/pull/23192 README and configurations are in tls_bumping subdir
2022.07.13 4 work items were worked out.
- Certificate Provider framework https://github.com/envoyproxy/envoy/issues/19308 https://github.com/envoyproxy/envoy/pull/19582
- SNI-based cert selection in tls transport socket https://github.com/envoyproxy/envoy/issues/21739 https://github.com/envoyproxy/envoy/pull/22036
- A new network filter - BumpingFilter https://github.com/envoyproxy/envoy/issues/22581 https://github.com/envoyproxy/envoy/pull/22582
- Certificate Provider instance - LocalMimicCertProvider https://github.com/envoyproxy/envoy/pull/23063
2022.04.24 update
Mimicking certs only based on SNI is probably not enough, we require server real certificate and ensure to copy subject, subject alt name, extensions, knowing about the RSA key strength and many more. Original proposal was to set up client-first secure connection, to meet above requirements we need server-first secure connection.
Therefore, we expect the workflow like this:
- downstream requires accessing some external websites like "google.com", the traffic is routed to Envoy
- Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
- Envoy connects "google.com" (upstream) and get real server certificate
- Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
- Envoy does TLS handshake with downstream using mimic certificate
- traffic is decrypted and go through Envoy network filters, especially HCM, there are many http filters and user can also expand http filter easily with WASM to plugin in many security functions.
- traffic is encrypted and sent to upstream.
Original Proposal
Title: decrypting communications between internal and external services
Description:
When Envoy works as sidecar or egress gateway in service mesh, Istio takes responsibility of certification generation and pushing the configs to Envoy via xDS. But when it works like typical proxy, the internal services on the edge might access many different external websites such as Google or Bing etc, Envoy does't provide the ability to terminate this kind of TLS traffic. For this scenario, we propose a method to let Envoy generate certs dynamically and do TLS handshake. Then if the client trusts the root ca that the certs signed from, it can access external services under the control of Envoy.
Changes (straw man)
- introduce an API to enable this feature and configure ca crt and key for signing
- get sni from tls inpector (we need sni to generate certs, just utilize tls inspector, probably no changes)
- generate certs according to sni
- set the certs to SSL object and then do handshake
Any comments are welcome.
/cc @lizan @asraa @ggreenway
Can you please elaborate on the desired traffic flow (client envoy's possition, server, which connections are TLS vs plaintext)?
I am curious what kind of cert is needed for the google/bing access.
If the upstream is google/bing, envoy doesn't terminate tls but initiate tls.
The straw man flow confuses me: is the cert applied in downstream connection or upstream connection?
@ggreenway @lambdai The desired traffic flow is like this: <downstream/internal service> ---- TLS(mimic cert generated by Envoy) ---- < Envoy> ---- TLS ---- <upstream/external service>
I mean envoy needs to terminate downstream TLS first, then we can apply many filters to control internal service accessing external network, and after that envoy initiates TLS to upstream. The mimic cert will be applied to downstream connection. There is no change to upstream connection. I'm not sure if I was using a proper word "terminate", if not please correct me.
Thanks for your comments.
Ok, I think I understand now. Let me paraphrase to make sure I understand: you'd like for envoy to have a CA cert/key, trusted by the downstream client, and for envoy to dynamically generate a TLS cert signed by the CA cert/key for whatever name is in the SNI of a connection?
@ggreenway Yes, exactly. Does it make sense for you?
I wrote some PoC code for dynamically generating cert, and I tested the downstream TLS handshake using the mimic cert.
For API change, envoy currently requires certs(static or sds) set in config yaml file, and the code path doesn't take the case I mentioned into consideration. To support this feature I need a proper API introduced to indicate we will do TLS handshake using dynamic cert . I would like you could help me on this new API definition, I'm thinking about adding "tls_root_certificates" to CommonTlsContext, and it is only valid when the commonTlsContext is part of DownstreamTlsContext:
[extensions.transport_sockets.tls.v3.CommonTlsContext]
{ "tls_params": "{...}", "tls_certificates": [], "tls_root_certificates": [], "tls_certificate_sds_secret_configs": [], "validation_context": "{...}", "validation_context_sds_secret_config": "{...}", "combined_validation_context": "{...}", "alpn_protocols": [], "custom_handshaker": "{...}" }
[extensions.transport_sockets.tls.v3.TlsRootCertificate]
{ "root_ca_cert": "{...}", "root_ca_key": "{...}" }
Do you think it is reasonable?
I think a more general approach would be to implement this as a listener filter. It could either run after tls_inspector
(which reads the SNI value), or re-implement that part. It can then generate the needed cert, and we can add an API for a listener filter to signal to the TLS transport_socket which certificate to use.
There have been other feature requests to support extremely large numbers of fixed/pre-generated certs and to choose the correct one at runtime, and this implementation could support that use case as well.
Does that sound workable to you?
Generating certs in a listener filter sounds workable. But an API for a listener filter might not be enough, the old DownstreamTlsContext
still requires user setting tls certificates, we can't avoid touching DownstreamTlsContext
or its sub apis.
I think we could add a FilterState
from the listener filter which contains the cert/key to use, and have SslSocket
check for it's presence and set the cert on the SSL*
(not SSL_CTX*
).
Yes, it's SSL*
(not SSL_CTX
).
Let me list several questions and answers to make the design clear:
- Where to generate certs?
After deliberation, I think
tls_inspector
is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we wanttls_inspector
to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic. In my PoC I generate the certs inSslSocket::setTransportSocketCallbacks
[1].
[1] https://github.com/envoyproxy/envoy/blob/3da250c6759ed9d2698e4e626fe1146cf696c316/source/extensions/transport_sockets/tls/ssl_socket.cc#L65
- Why we can't avoid touching
DownstreamTlsContext
API. [2] shows Envoy requiring user setting tls certificates otherwise it exits during bootstrap. I went through some code, a easy way is to introduce an API to indicate it has the capability to provide certificates[3].
[2]https://github.com/envoyproxy/envoy/blob/9cc74781d818aaa58b9cca9602fe8dc62181d27b/source/extensions/transport_sockets/tls/context_config_impl.cc#L411 [3]https://github.com/envoyproxy/envoy/blob/9cc74781d818aaa58b9cca9602fe8dc62181d27b/source/extensions/transport_sockets/tls/context_config_impl.cc#L408
- Where to set CA cert/key?
Since we have to modify
DownstreamTlsContext
(2nd question), I prefer it's for per transportsocket but not per listener, what do you think?
- Where to generate certs? After deliberation, I think
tls_inspector
is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we wanttls_inspector
to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic.
Yeah this all makes sense, having generating part in transport socket sounds reasonable to me. We might need a cache to store generated cert so they aren't generated for every connection.
Perhaps SDS should be acts as that counted cache. RDS/ECDS/EDS maintains the N:1 mapping (N subscription 1 config) and it's not surprising to introduce to SDS.
@LuyaoZhong My understanding is that your POC is generating CSR, if this functionality can be moved to SDS, some of the SDS server could be leveraged
@lizan @lambdai Thanks for your comments. A cache sounds good. SDS could be one option to cache the dynamic certs, we are supposed to support both local cache and SDS, right? If so, I want to start with local cache and then introduce SDS later. Does it make sense for you?
I investigated the API, related classes and workflow, and completed the first version of code, see https://github.com/envoyproxy/envoy/pull/19137.
In this code version, we have done:
- introduce an API to set root CA cert/key
common_tls_context:
tls_root_ca_certificate:
cert: {"filename": "root-ca.pem"}
private_key: {"filename": "root-ca.key"}
- implement a local cache to store generated certs pair
- Generate/reuse dynamic certificates pair in TLS transport socket and set SSL* a. if there is no corresponding cached certs, create certs signed from root CA, then store the generated certs to local cache b. if there is corresponding cached certs, reuse them according to host name
I'll split the patch, polish the code, reword the original proposal description after some design details settle down. Could you help review the design items I listed above. What's your suggestion?
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This is no stale.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
@lizan @ggreenway could you help label this as "no statebot" Previous design and PoC is mimicing certificate and doing TLS handshake with downstream before connecting upstream, is it possible to support the following workflow in Envoy?
- downstream requires accessing some external websites like "https://www.google.com"
- Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
- Envoy connects "https://www.google.com"(upstream) and get real server certificate
- Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
- Envoy does TLS handshake with downstream using mimic certificate
- traffic is decrypted and go through Envoy network filters.
You could achieve this by writing a listener filter. I think you could put it in the listener filters after the existing tls inspector. At this point the SNI is known, so you could so whatever asynchronous work you need to do, and when it's finished, allow the filter chain to continue.
- downstream requires accessing some external websites like "https://www.google.com"
Reading the below, I think the domain name is captured by envoy via SNI? I want to add that the http host in plain text is also supported.
- Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
- Envoy connects "https://www.google.com"(upstream) and get real server certificate
AFAIK there is not many envoy components you could you under the context of listener filter. But it's definitely possible.
- Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
- Envoy does TLS handshake with downstream using mimic certificate
With #19582, right? It seems a TLS transport socket need to generated on demand along with a mimic cert provider. That could be something new, or continue the work on creating an ondemand network filter(which includes a transport socket)
- traffic is decrypted and go through Envoy network filters.
@ggreenway @lambdai Will it cause two connections to upstream for one request? Since we need to connect upstream and get server certificate back first. And after the traffic is decrypted and goes though network filters, we connect upstream again to transfer traffic data. It is possible to reused the first connection?
Besides, #19582 is a extension in current TLS transport socket, if implementing the mimicking inside listener filter it seems I need to implement TLS transport socket inside listener filter otherwise I can not reuse #19582. Does it make sense to integrating a transport socket inside listener filter?
@lambdai could you provide more details about "there is not many envoy components in listener filter". How big is the gap?
re: connect upstream and get server certificate
Can this job achieved as part of the cert provider bootstrap(or another extension)? If so, you only need ref the new component in the listener filter and register a resume path to drive the listener filter on cert fetched
add @liverbirdkte
@lambdai @ggreenway
re:
connect upstream and get server certificate
Can this job achieved as part of the cert provider bootstrap(or another extension)? If so, you only need ref the new component in the listener filter and register a resume path to drive the listener filter on cert fetched
It sounds moving the implementation from listener filter to cert provider, while cert provider has less components than listener filter and it's not ready for now. Which design do you prefer, cert provider + listener filter, or a stand-along listener filter?
We have to connect upstream in downstream subsystem of Envoy. Besides, we need to store that socket to somewhere and use it when transforming data to upstream, Otherwise AFAIK Envoy will try to create a new connection to upstream. Is there any risk to implement this?
@ggreenway @lambdai
HCM is designed as a terminal filter, it brings a lot of limitation in our case. We want this feature could work with HCM. After the traffic is decrypted, we make the data go through HCM, users could plug in many security functions by extending http filters, therefore the traffic to external is monitored.
Current HCM sets up connection with Upstream in http router filter after receiving http headers, if we implement another listener filter or network filter or any other component to get server cert, how to make HCM reuse the connection is the problem we need to address. It seems not easy, do you have any suggestion?
We came up with another idea. What about making HCM implementing both ReadFilter and WriteFilter and worked as a non-terminal filter? We need to decouple the request and response processing. onData
(ReadFilter) corresponds to request path, onWrite
(WriteFilter) corresponds to response path. HCM will not connect upstream, to get server cert, send and receive data from upstream, we need a terminal filter like tcp proxy at the end of network filter chain.
Does it make sense?
@ggreenway @lambdai ping
I don't understand what you're trying to accomplish. Are you trying to make sure that downstream connections re-use the same upstream connection?
Changing HCM to a non-terminal filter does not seem like a viable approach.
Sorry, I don't fully understand your intention. I sincerely think you need a "better"(in term of reuse and consuming RDS) http async client to fetch the cert.
The HCM as a network filter is over kill because you need to feed the data to HCM and drain.
Since I don't know how good the current http async client is, you can always use the current http async client from an internal cluster
.
That internal cluster contains the internal address. Meanwhile you can create an internal listener "listening" on that address, and that listener contains your desired HCM which can consume RDS and use any upstream cluster type.