envoy
envoy copied to clipboard
Improve client side TLS session reuse
We are running envoy as host proxy to enforce mTLS. We find that Envoy could be hammer by TLS handshake. We enabled max_session_keys in UpstreamTlsContext and see signicant CPU utilization drop in experiment.
However, in production the max_session_keys is not very pratical, because the session key pool is shared by all the endpoint in the cluster and there is no hint to select the available session.
For example, if the cluster has two endpoints, 10.0.0.1:80 and 10.0.0.2:80. When 10.0.0.1:80 is selected by the load balancer, ssl socket may pick the saved session that are created with 10.0.0.2:80. See https://github.com/envoyproxy/envoy/pull/4791
In the worse secanrio, if the two endpoints are selected in turn, the session reuse rate is 0. In realworld, the session reuse rate drop to 1/N where N is the endpoints in the cluster.
My proposal is to add hint when selecting the ssl session. A typical hint is the upstream endpoint address.
The hint can be stored in the transport socket options which is already attached to the SSL_SESSION,
and
replace the per ClientContext
std::deque<bssl::UniquePtr<SSL_SESSION>> session_keys_
by
std::unordered_map<SessionHint, std::deque<bssl::UniquePtr<SSL_SESSION>>> session_key_map_
- when a
SSL_SESSIONis created withSSLfind the hint of theSSLand cache the session in thesession_key_map_ - when a SSL is created via
ClientContextImpl::newSsl(const Network::TransportSocketOptionsConstSharedPtr&)use the transport socket option to find hint, set session to theSSLif there is available aSSL_SESSION
The alternative method is spread the session among TLS servers, but this method not only expand the compromise size, it also introduce shared storage as a depedencies. The extra dependency does not work when Envoy is close to the lowest layer.
Dumb question: why aren't session keys stored per socket? why are they shared between multiple SSL sockets?
Dumb question: why aren't session keys stored per socket? why are they shared between multiple SSL sockets?
SSL socket stands for a TLS connection. Without reusing the stored SSL_SESSION, a new SSL_SESSION must be
built via the expensive asymmetric cryptography.
The session_key refers to the result of the asymmetric cryptography method. At the client side, the session_key is shared by new SSL socket(s) and wish that the server could restore the session for the following ssl connections.
Do I answer your question?
@lambdai Yes, I understand the session keys allow skipping the handshake. I'm just surprised the keys are not associated with an endpoint already, and asking why it was done this way.
The most recent session reuse efforts was done by @PiotrSikora in https://github.com/envoyproxy/envoy/blame/main/source/common/tls/context_impl.cc#L774
Let me quickly add a PR to explain my proposed interface change
I prefer TransportSocketOptions because we can populate tag value out of TLS transport socket for example, hash proxy protocol values in the tag (or not).
the tls socket config should config whether to override the external tag, ignore the external tag, or add endpoint address only if external tag is not provided
@kyessenov Do you think https://github.com/envoyproxy/envoy/pull/33517 is a good start?
I have some early code proving that there is no memory access issue and much better session use.
@lambdai Yes, I understand the session keys allow skipping the handshake. I'm just surprised the keys are not associated with an endpoint already, and asking why it was done this way.
This is similar (or the same) as a very old issue: #5073. The only reason this hasn't been done, is because nobody put in the time to do it.
@ggreenway Thank you for the pointer and review! I add my comment to the interface ish PR. Please let me know if my comments make sense.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.