redis-py icon indicating copy to clipboard operation
redis-py copied to clipboard

Failover handling improvements for RedisCluster and Async RedisCluster

Open barshaul opened this issue 1 year ago • 1 comments

Pull Request check-list

Please make sure to review and check all of these items:

  • [x] Does $ tox pass with this change (including linting)?
  • [ ] Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
  • [x] Is the new or changed code fully tested?
  • [x] Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • [ ] Is there an example added to the examples folder (if applicable)?
  • [x] Was the change added to CHANGES file?

NOTE: these things are not required to open a PR and can be done afterwards / while the PR is open.

Description of change

Failover handling improvements for RedisCluster and Async RedisCluster

  1. Handle cluster connection errors in the retry object -Removed handling of timeouts/connection errors within the cluster loop to reduce retries duplication with Retry objects and enable backoff strategies. Each cluster node will share the cluster's Retry object for retry and backoff policy.
  2. Fixed "cannot pickle '_thread.lock' object" bug- When we use NodesManager's initialize function to refresh the cluster topology after a failover, if the chosen startup node was the failed node, we received the above exception when trying to copy the connection kwargs (traceback at the end), and this error was raised to the user since it was not expected. It caused the initialize loop to get stuck on the failed node and not continue. Removed the copy of kwargs as it no longer needed since we call str_if_bytes on the conncetion's responses. We now throw exceptions in the initialize() loop only if there are no more startup nodes available. Closing #2354 #2297
  3. Removing the failed target node from the seed nodes when connection/timeout errors occur: the failed target node will return to the startup nodes if discovered on the cluster slots, but we'll skip it in this initialize iteration.
  4. Backwards comparability for connection_error_retry_attempts - connection_error_retry_attempts can still be used to pass the number of retries to execute on connection/timeout error. If connection_error_retry_attempts = 0 and retry object isn't being passed, no retries will be done on connecton/timeout errors for the cluster.
Traceback (most recent call last):   File "/home/ubuntu/redis-py/redis/cluster.py", line 1424, in initialize

    copy_kwargs = copy.deepcopy(kwargs)

  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy

    y = copier(x, memo)

...
  File "/usr/lib/python3.10/copy.py", line 146, in deepcopy

    y = copier(x, memo)

  File "/usr/lib/python3.10/copy.py", line 231, in _deepcopy_dict

    y[deepcopy(key, memo)] = deepcopy(value, memo)

  File "/usr/lib/python3.10/copy.py", line 161, in deepcopy

    rv = reductor(4)

TypeError: cannot pickle '_thread.lock' object

barshaul avatar Sep 10 '22 17:09 barshaul

Codecov Report

Base: 92.05% // Head: 92.12% // Increases project coverage by +0.07% :tada:

Coverage data is based on head (c306b37) compared to base (ab922db). Patch coverage: 97.96% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2377      +/-   ##
==========================================
+ Coverage   92.05%   92.12%   +0.07%     
==========================================
  Files         110      110              
  Lines       28705    28882     +177     
==========================================
+ Hits        26423    26607     +184     
+ Misses       2282     2275       -7     
Impacted Files Coverage Δ
redis/asyncio/connection.py 86.34% <83.33%> (-0.05%) :arrow_down:
redis/cluster.py 90.70% <94.11%> (+0.85%) :arrow_up:
tests/test_cluster.py 96.86% <97.18%> (-0.22%) :arrow_down:
redis/__init__.py 90.90% <100.00%> (+0.43%) :arrow_up:
redis/asyncio/__init__.py 100.00% <100.00%> (ø)
redis/asyncio/client.py 92.46% <100.00%> (+0.23%) :arrow_up:
redis/asyncio/cluster.py 91.21% <100.00%> (+0.92%) :arrow_up:
redis/backoff.py 70.58% <100.00%> (+11.01%) :arrow_up:
redis/client.py 89.15% <100.00%> (+0.08%) :arrow_up:
redis/connection.py 86.40% <100.00%> (+0.09%) :arrow_up:
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Sep 11 '22 06:09 codecov-commenter

@chayim @dvora-h finished round

barshaul avatar Nov 02 '22 15:11 barshaul

@dvora-h let's do one more review to make sure we're good to go.

chayim avatar Nov 08 '22 07:11 chayim

I think this is ready

dvora-h avatar Nov 08 '22 10:11 dvora-h

I met the same problem with TypeError: cannot pickle '_thread.lock' object, Is this PR missed in 4.4.0-rc3?

latelan avatar Nov 09 '22 08:11 latelan

I met the same problem with TypeError: cannot pickle '_thread.lock' object, Is this PR missed in 4.4.0-rc3?

This PR does not exist in 4.4.0-rc3. We're looking at 4.4.0-rc4 for this...

chayim avatar Nov 09 '22 08:11 chayim