valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Try to stabilize the failover call in the slot migration test

Open enjoy-binbin opened this issue 1 year ago • 2 comments

The CI report replica will return the error when performing CLUSTER FAILOVER: -ERR Master is down or failed, please use CLUSTER FAILOVER FORCE

This may because the primary state is fail or the cluster connection is disconnected during the primary pause. In this PR, we added some waits in wait_for_role, if the role is replica, we will wait for the replication link and the cluster link to be ok.

enjoy-binbin avatar Sep 26 '24 11:09 enjoy-binbin

CI link: https://github.com/valkey-io/valkey/actions/runs/11028380661/job/30628495083?pr=1073#step:5:3785

[exception]: Executing test client: ERR Master is down or failed, please use CLUSTER FAILOVER FORCE.
ERR Master is down or failed, please use CLUSTER FAILOVER FORCE
    while executing
"[Rn $n] {*}$args"
    (procedure "R" line 2)
    invoked from within
"R 1 cluster failover"
    ("uplevel" body line 12)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 58)
    invoked from within
"test "Migration target is auto-updated after failover in target shard" {
        # Trigger an auto-failover from R1 to R4
        fail_server 1

enjoy-binbin avatar Sep 26 '24 11:09 enjoy-binbin

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.65%. Comparing base (bf8183d) to head (5321e03). Report is 82 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1078      +/-   ##
============================================
+ Coverage     70.61%   70.65%   +0.03%     
============================================
  Files           114      114              
  Lines         61695    63150    +1455     
============================================
+ Hits          43568    44617    +1049     
- Misses        18127    18533     +406     

see 92 files with indirect coverage changes

codecov[bot] avatar Sep 26 '24 11:09 codecov[bot]

@PingXie Do you have time to check this out? (Or just simply approve it since it is not easy to figure out the test) We used to have frequent test issues in this one but they seem to be gone recently. I wonder if we have some other PRs that has implictly fix it? And in addition, this content should be harmless and can stablize the test, don't want to waste the efforts.

enjoy-binbin avatar Oct 25 '24 11:10 enjoy-binbin