seastar
seastar copied to clipboard
Avoid long stalls during TLS handshakes
In one ScyllaDB workload using an HTTPS server, we noticed that each connection establishment causes a roughly-30ms stall.
While the fact that each TLS handshake is taking 30ms is sad (it means that each shard can only do about 30 of those per second...), what is much more troubling for a Seastar applications is that these handshakes happen without preemption points, and cause a 30ms stall and potentially huge latencies for other requests running on this shard.
This issue isn't about making handshakes faster (which we should do) or reducing their numbers (which we've been doing - https://github.com/scylladb/seastar/issues/2154 is one attempt at reducing their number). This isssue is about avoiding the stall during the handshake if we can't avoid the handshake.
I can think of different ways to avoid these stalls, with decreasing level of desirability but increasing easiness of implementation:
- Modify the TLS implementation to use Seastar futures and incorporate preemption checks. This is probably not a realistic solution without massive modifications to OpenSSL - unless OpenSSL comes with hooks to do that, which I'm guessing it doesn't.
- A simpler version of 1, probably still requiring modifications to OpenSSL but much fewer, is to run these TLS handshakes in a
seastar::threadand add preemption points in the right places. - An approach that could work without modifications to OpenSSL is run in it in a different Linux thread. This will be ugly but we've already been reserving in some setups separate cores for networking, so maybe it makes sense to do the same also for TLS requests. Or, even if we run these TLS threads on the same cores as ordinary Seastar (the horror!), we'll still get stalls (when the Seastar thread isn't running) but probably not 30ms stalls.
Another thing we should do that I'll tack onto this issue but perhaps should be split into a different issue, is to add metrics that will be useful for analyzing these slow TLS handshake problems. Perhaps count the number of handshakes or count of various cryptographic calculations or something, and perhaps we can also count the amount of time that each handshake takes (if there is no preemption, it's easy to calculate this time).
CC @elcallio @avikivity
1.) We use gnutls, not openssl. Though @avikivity might want to change this. 2.) Handshake being slow is so weird. It is a CPU-intensive, but not huge, process. We might want to look into which versions of the library we use, and how it is compiled. At least some other project issues mention some instances of it being slow due to distro config/build. So at least a proper profile might be useful.
2.) Handshake being slow is so weird. It is a CPU-intensive, but not huge, process. We might want to look into which versions of the library we use, and how it is compiled. At least some other project issues mention some instances of it being slow due to distro config/build. So at least a proper profile might be useful.
This reminds me that I have somewhere an ask to configure our PGO/LTO training to run with encryption (and compression) enabled. It isn't today.
In ScyllaDB, maybe we modify the perf_alternator benchmark to have an option to 1. test request speed without reusing the connection, 2. use https instead of http for the connection. This can be useful to understand if the 30ms connection creation time is real and perhaps how it's composed, and also for PGO that @mykaul mentioned.
Ref https://github.com/scylladb/scylladb/issues/17719 Proposed solution: https://github.com/scylladb/scylladb/issues/17719#issuecomment-3356860974