rally
rally copied to clipboard
[SSL: TOO_MANY_KEY_UPDATES] too many key updates During Bulk Indexing
Rally version (get with esrally --version): esrally 2.6.1.dev0 (git revision: 20f7600)
Invoked command:
esrally race --pipeline=benchmark-only --target-hosts="10.10.27.206:9200,10.10.28.186:9200,10.10.35.82:9200,10.10.38.34:9200" --track-repository="internal" --track="solutions/logs" --challenge="logging-indexing" --track-params=indexing-logging-with-snapshots-experiment-0-track-params.json --client-options=indexing-logging-with-snapshots-experiment-0-client-options.json --load-driver-hosts=$load_driver_hosts
OS version: Ubuntu 18.04.4 LTS
Description of the problem including expected versus actual behavior:
OpenSSL is returning an intermittent [SSL: TOO_MANY_KEY_UPDATES] too many key updates (_ssl.c:2635) error during bulk indexing which stalls connection to the target Elasticsearch node for 60s. This can negatively impact benchmark results. The error is entirely unexpected.
Steps to reproduce:
The behavior was logged 237 times in an indexing benchmark spanning multiple days. No attempt outside of a running benchmark has been made to reproduce it.
rally.log
2022-07-14 21:35:22,647 -not-actor-/PID:27448 esrally.rally INFO OS [uname_result(system='Linux', node='rally-0', release='5.3.0-1023-aws', version='#25~18.04.1-Ubuntu SMP Fri Jun 5 15:18:30 UTC 2020', machine='x86_64', processor='x86_64')]
2022-07-14 21:35:22,647 -not-actor-/PID:27448 esrally.rally INFO Python [namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-38', hexversion=50859504, name='cpython', version=sys.version_info(major=3, minor=8, micro=13, releaselevel='final', serial=0))]
2022-07-14 21:35:22,652 -not-actor-/PID:27448 esrally.rally INFO Rally version [2.6.1.dev0 (git revision: 20f7600)]
2022-07-14 21:35:22,652 -not-actor-/PID:27448 esrally.utils.net INFO Connecting directly to the Internet (no proxy support
...
2022-07-20 14:03:29,526 ActorAddr-(T|:46637)/PID:25724 elasticsearch WARNING POST https://10.10.28.186:9200/_bulk [status:N/A request:0.007s]
Traceback (most recent call last):
File "/home/esbench/.local/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
async with self.session.request(
File "/home/esbench/.local/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
self._resp = await self._coro
File "/home/esbench/.local/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
await resp.start(conn)
File "/home/esbench/.local/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/home/esbench/.local/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
await self._waiter
aiohttp.client_exceptions.ClientOSError: [Errno 1] [SSL: TOO_MANY_KEY_UPDATES] too many key updates (_ssl.c:2635)
2022-07-20 14:03:29,527 ActorAddr-(T|:46637)/PID:25724 elasticsearch WARNING Connection <AIOHttpConnection: https://10.10.28.186:9200> has failed for 1 times in a row, putting on 60 second timeout.
CPython here is receiving the error from libssl. Though the error was observed in Rally in the elasticsearch-py async client, it is difficult to tell if this behavior is the result of something Rally, ES async client, aiohttp, or OpenSSL itself is doing. It comes from OpenSSL TLSv1.3 KeyUpdate. KeyUpdate happens when the sender is updating its sending crypto keys, further described in RFC 8446, section 4.6.3. It reads like a protection mechanism and it feels like something isn't quite right with TLSv1.3 usage someplace. We might be able to uncover more with packet captures and process traces.
Intriguing! This appears to be an OpenSSL bug that was fixed in 1.1.1b: https://github.com/openssl/openssl/pull/8299. I don't know what version was used here but it looks like 18.04 only includes 1.1.1 (without any letter ie. without that fix): https://packages.ubuntu.com/bionic/openssl
Thank you @pquentin, that could be it! The OpenSSL version:
$ openssl version
OpenSSL 1.1.1 11 Sep 2018
I'm afraid there isn't much we can do on our side to work around this issue. We could eg. close the connection every X requests but that feels like a hack when the real fix for long running benchmarks is to use a slightly newer version of OpenSSL. We can however warn when we detect OpenSSL<1.1.1b.
Intriguing! This appears to be an OpenSSL bug that was fixed in 1.1.1b: https://github.com/openssl/openssl/pull/8299. I don't know what version was used here but it looks like 18.04 only includes 1.1.1 (without any letter ie. without that fix): https://packages.ubuntu.com/bionic/openssl
Ubuntu backports changes from public builds to their own build number. This is explained more eloquently here