Optimize cluster election when nodes initiate elections at the same epoch
If multiple primary nodes go down at the same time, their replica nodes will initiate the elections at the same time. There is a certain probability that the replicas will initate the elections in the same epoch.
And obviously, in our current election mechanism, only one replica node can eventually get the enough votes, and the other replica node will fail to win due the the insufficient majority, and then its election will time out and we will wait for the retry, which result in a long failure time.
If another node has been won the election in the failover epoch, we can assume that my election has failed and we can retry as soom as possible.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 70.73%. Comparing base (e972d56) to head (6291ed6).
:warning: Report is 643 commits behind head on unstable.
Additional details and impacted files
@@ Coverage Diff @@
## unstable #1009 +/- ##
============================================
+ Coverage 70.70% 70.73% +0.02%
============================================
Files 114 114
Lines 63147 63151 +4
============================================
+ Hits 44648 44669 +21
+ Misses 18499 18482 -17
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/cluster_legacy.c | 86.43% <100.00%> (+0.20%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@madolson @zuiderkwast do you guys want to take a look with this?
The PR title says "Optimize ..." but it is more than an optimization. Actually a bug fix? Please improve the title. :)
I think it is indeed more of an optimization, making the election fail ASAP and retrying ASAP. But it can also considered a bug fix, maybe: Fix replica not able to initate election in time when epoch fails?
The macos job failed. It's probably not related to this PR. It's a fascinating crash log though:
.+^+.
.+#########+.
.+########+########+. Valkey 255.255.255 (125c71fe/0) 64 bit
.+########+' '+########+.
.########+' .+. '+########. Running in standalone mode
|####+' .+#######+. '+####| Port: 21995
I/O error reading reply
|###| .+###############+. |###| PID: 30605
while executing
|###| |#####*'' ''*#####| |###|
"$r set [expr rand()] [expr rand()]"
|###| |####' .-. '####| |###|
(procedure "gen_write_load" line 8)
|###| |###( (@@@) )###| |###| https://valkey.io/
invoked from within
|###| |####. '-' .####| |###|
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
|###| |#####*. .*#####| |###|
(file "tests/helpers/gen_write_load.tcl" line 24)I/O error reading reply
|###| '+#####| |#####+' |###|
while executing
|####+. +##| |#+' .+####|
"$r set [expr rand()] [expr rand()]"
'#######+ |##| .+########'
(procedure "gen_write_load" line 8)
'+###| |##| .+########+'
invoked from within
'| |####+########+'
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
+#########+'
(file "tests/helpers/gen_write_load.tcl" line 24)
'+v+'
I/O error reading reply
30605:M 09 Nov 2024 14:12:32.606 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
while executing
30605:M 09 Nov 2024 14:12:32.607 * Server initialized
"$r set [expr rand()] [expr rand()]"
30605:M 09 Nov 2024 14:12:32.607 * Ready to accept connections tcp
(procedure "gen_write_load" line 8)
30605:M 09 Nov 2024 14:12:32.607 * Ready to accept connections unix
invoked from within
"gen_write_load [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
(file "tests/helpers/gen_write_load.tcl" line 24)
30605:M 09 Nov 2024 14:12:32.894 - Accepted 127.0.0.1:64082
30605:M 09 Nov 2024 14:12:32.894 - Client closed connection id=3 addr=127.0.0.1:64082 laddr=127.0.0.1:21995 fd=14 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=16890 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=34176 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1
i try to fix the test in #1288