seastar icon indicating copy to clipboard operation
seastar copied to clipboard

Avoid long stalls during TLS handshakes

Open nyh opened this issue 8 months ago • 3 comments

In one ScyllaDB workload using an HTTPS server, we noticed that each connection establishment causes a roughly-30ms stall.

While the fact that each TLS handshake is taking 30ms is sad (it means that each shard can only do about 30 of those per second...), what is much more troubling for a Seastar applications is that these handshakes happen without preemption points, and cause a 30ms stall and potentially huge latencies for other requests running on this shard.

This issue isn't about making handshakes faster (which we should do) or reducing their numbers (which we've been doing - https://github.com/scylladb/seastar/issues/2154 is one attempt at reducing their number). This isssue is about avoiding the stall during the handshake if we can't avoid the handshake.

I can think of different ways to avoid these stalls, with decreasing level of desirability but increasing easiness of implementation:

  1. Modify the TLS implementation to use Seastar futures and incorporate preemption checks. This is probably not a realistic solution without massive modifications to OpenSSL - unless OpenSSL comes with hooks to do that, which I'm guessing it doesn't.
  2. A simpler version of 1, probably still requiring modifications to OpenSSL but much fewer, is to run these TLS handshakes in a seastar::thread and add preemption points in the right places.
  3. An approach that could work without modifications to OpenSSL is run in it in a different Linux thread. This will be ugly but we've already been reserving in some setups separate cores for networking, so maybe it makes sense to do the same also for TLS requests. Or, even if we run these TLS threads on the same cores as ordinary Seastar (the horror!), we'll still get stalls (when the Seastar thread isn't running) but probably not 30ms stalls.

Another thing we should do that I'll tack onto this issue but perhaps should be split into a different issue, is to add metrics that will be useful for analyzing these slow TLS handshake problems. Perhaps count the number of handshakes or count of various cryptographic calculations or something, and perhaps we can also count the amount of time that each handshake takes (if there is no preemption, it's easy to calculate this time).

CC @elcallio @avikivity

nyh avatar Mar 30 '25 16:03 nyh

1.) We use gnutls, not openssl. Though @avikivity might want to change this. 2.) Handshake being slow is so weird. It is a CPU-intensive, but not huge, process. We might want to look into which versions of the library we use, and how it is compiled. At least some other project issues mention some instances of it being slow due to distro config/build. So at least a proper profile might be useful.

elcallio avatar Mar 31 '25 08:03 elcallio

2.) Handshake being slow is so weird. It is a CPU-intensive, but not huge, process. We might want to look into which versions of the library we use, and how it is compiled. At least some other project issues mention some instances of it being slow due to distro config/build. So at least a proper profile might be useful.

This reminds me that I have somewhere an ask to configure our PGO/LTO training to run with encryption (and compression) enabled. It isn't today.

mykaul avatar Mar 31 '25 08:03 mykaul

In ScyllaDB, maybe we modify the perf_alternator benchmark to have an option to 1. test request speed without reusing the connection, 2. use https instead of http for the connection. This can be useful to understand if the 30ms connection creation time is real and perhaps how it's composed, and also for PGO that @mykaul mentioned.

nyh avatar Mar 31 '25 09:03 nyh

Ref https://github.com/scylladb/scylladb/issues/17719 Proposed solution: https://github.com/scylladb/scylladb/issues/17719#issuecomment-3356860974

vladzcloudius avatar Oct 06 '25 16:10 vladzcloudius