redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

RedisCluster becomes unrecoverable if all nodes timeout

Open kuza55 opened this issue 9 months ago • 1 comments

Version: 5.1.2

Platform: Ubuntu 22.04

Description:

RedisCluster becomes unrecoverable and crashes if all the nodes timeout at the same time. If you have a RedisCluster with 1 node, then this is particularly likely.

The crash that happens is:

Traceback (most recent call last):
  File "/app/lib/python3.11/site-packages/opentelemetry/trace/__init__.py", line 573, in use_span
    yield span
  File "/app/lib/python3.11/site-packages/opentelemetry/sdk/trace/__init__.py", line 1046, in start_as_current_span
    yield span
  File "/app/lib/python3.11/site-packages/opentelemetry/instrumentation/redis/__init__.py", line 263, in _async_traced_execute_command
    response = await func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 721, in execute_command
    await self.initialize()
  File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 419, in initialize
    await self.nodes_manager.initialize()
  File "/app/lib/python3.11/site-packages/redis/asyncio/cluster.py", line 1347, in initialize
    raise RedisClusterException(
redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node: None

I think this is because of this line where the node is removed, expecting that we will connect to another node and recover the cluster instances from there: https://github.com/redis/redis-py/blob/07fc339b4a4088c1ff052527685ebdde43dfc4be/redis/asyncio/cluster.py#L806

This bug seems similar to, but distinct from https://github.com/redis/redis-py/issues/3130

Also seems related to https://github.com/redis/redis-py/issues/2472

kuza55 avatar May 02 '24 15:05 kuza55

I'm having the same issue here.

A simple method to reproduce it is to connect to a redis cluster through the internet (AWS Elasticache for example) and then turn your wifi/ethernet off and then enable it again, the error won't stop and it will raise RedisClusterException in a infinite loop.

julianogv avatar May 08 '24 21:05 julianogv