redis-py
redis-py copied to clipboard
RedisCluster becomes unrecoverable if all nodes timeout
Version: 5.1.2
Platform: Ubuntu 22.04
Description:
RedisCluster becomes unrecoverable and crashes if all the nodes timeout at the same time. If you have a RedisCluster with 1 node, then this is particularly likely.
The crash that happens is:
Traceback (most recent call last):
File "/app/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 573, in use_span
yield span
File "/app/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1046, in start_as_current_span
yield span
File "/app/lib/python3.11/site-packages/opentelemetry/instrumentation/redis/__init__.py", line 263, in _async_traced_execute_command
response = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 721, in execute_command
await self.initialize()
File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 419, in initialize
await self.nodes_manager.initialize()
File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 1347, in initialize
raise RedisClusterException(
redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node: None
I think this is because of this line where the node is removed, expecting that we will connect to another node and recover the cluster instances from there: https://github.com/redis/redis-py/blob/07fc339b4a4088c1ff052527685ebdde43dfc4be/redis/asyncio/cluster.py#L806
This bug seems similar to, but distinct from https://github.com/redis/redis-py/issues/3130
Also seems related to https://github.com/redis/redis-py/issues/2472
I'm having the same issue here.
A simple method to reproduce it is to connect to a redis cluster through the internet (AWS Elasticache for example) and then turn your wifi/ethernet off and then enable it again, the error won't stop and it will raise RedisClusterException in a infinite loop.