Try to stabilize the failover call in the slot migration test
The CI report replica will return the error when performing CLUSTER FAILOVER: -ERR Master is down or failed, please use CLUSTER FAILOVER FORCE
This may because the primary state is fail or the cluster connection is disconnected during the primary pause. In this PR, we added some waits in wait_for_role, if the role is replica, we will wait for the replication link and the cluster link to be ok.
CI link: https://github.com/valkey-io/valkey/actions/runs/11028380661/job/30628495083?pr=1073#step:5:3785
[exception]: Executing test client: ERR Master is down or failed, please use CLUSTER FAILOVER FORCE.
ERR Master is down or failed, please use CLUSTER FAILOVER FORCE
while executing
"[Rn $n] {*}$args"
(procedure "R" line 2)
invoked from within
"R 1 cluster failover"
("uplevel" body line 12)
invoked from within
"uplevel 1 $code"
(procedure "test" line 58)
invoked from within
"test "Migration target is auto-updated after failover in target shard" {
# Trigger an auto-failover from R1 to R4
fail_server 1
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 70.65%. Comparing base (
bf8183d) to head (5321e03). Report is 82 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #1078 +/- ##
============================================
+ Coverage 70.61% 70.65% +0.03%
============================================
Files 114 114
Lines 61695 63150 +1455
============================================
+ Hits 43568 44617 +1049
- Misses 18127 18533 +406
@PingXie Do you have time to check this out? (Or just simply approve it since it is not easy to figure out the test) We used to have frequent test issues in this one but they seem to be gone recently. I wonder if we have some other PRs that has implictly fix it? And in addition, this content should be harmless and can stablize the test, don't want to waste the efforts.