valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Improve "SENTINEL FAILOVER" by using the "FAILOVER" command

Open gmbnomis opened this issue 1 year ago • 11 comments

This PR implements #1291 and consists of the following commits:

Two commits to fix problems with the sentinel tests:

Commit: Wait for all Sentinels to be connected before starting tests

Up to now the sentinel test initialization verified that all sentinels detect each other. However, detection does not imply connection, which led to intermittent failures in the coordinated failover tests (no leader elected since disconnected sentinels do not take part in a vote).

Fix this by waiting until no sentinel reports being "disconnected".

Commit: sentinel-tests: Clean up config after config set tests

Three commits with the actual implementation:

Commit: Add option for coordinated failover to Sentinel (coordinated failover without leader election) Commit: Allow Sentinel to recover from a stuck FAILOVER Commit: SENTINEL FAILOVER COORDINATED actually does a leader election (adds leader election before failover starts)

gmbnomis avatar Nov 12 '24 07:11 gmbnomis

Codecov Report

Attention: Patch coverage is 0% with 98 lines in your changes missing coverage. Please review.

Project coverage is 71.38%. Comparing base (1941d28) to head (384002f). Report is 23 commits behind head on unstable.

Files with missing lines Patch % Lines
src/sentinel.c 0.00% 98 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1292      +/-   ##
============================================
- Coverage     71.51%   71.38%   -0.14%     
============================================
  Files           122      122              
  Lines         66439    66531      +92     
============================================
- Hits          47517    47494      -23     
- Misses        18922    19037     +115     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/sentinel.c 0.00% <0.00%> (ø)

... and 12 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Nov 12 '24 07:11 codecov[bot]

@enjoy-binbin I don't want to be pushy, but this PR is over a month old now. Could you have a look at the PR and/or #1291 when you find some time?

gmbnomis avatar Dec 17 '24 16:12 gmbnomis

sorry, i dont have enough time to review this, i guess i will be free in the next month. @hwware wen, do you think you have the time to review this in a short time? since you know the sentinel better than me.

enjoy-binbin avatar Dec 18 '24 02:12 enjoy-binbin

Sure, let me review it. @enjoy-binbin

hwware avatar Dec 19 '24 19:12 hwware

Reference i

@gmbnomis Sorry, let me review it ASAP.

hwware avatar Dec 19 '24 19:12 hwware

Could you please provide a CI result which include sentinel test suites, because there is no sentinel test in the commit CI? Thanks

hwware avatar Dec 20 '24 18:12 hwware

Could you please provide a CI result which include sentinel test suites, because there is no sentinel test in the commit CI? Thanks

Here is a manual run of the Daily job test-ubuntu-jemalloc with sentinel tests enabled. Is this what you are looking for? If not, please let me know what I should run.

gmbnomis avatar Dec 20 '24 23:12 gmbnomis

@hwware Thank you for your review! Given that this PR has been inactive for an additional month, could you please provide guidance on how we can move forward with getting it merged?

gmbnomis avatar Feb 15 '25 18:02 gmbnomis

Sorry, I am busying on other stuffs recently. Let me check this PR and https://github.com/valkey-io/valkey/pull/1964 together. I think they are great helpful for sentinel enhancement.

hwware avatar Apr 23 '25 16:04 hwware

Please update the json file (version to 9.0) then the maintainers have chance to merge this PR. I think it already passes too long time, Sorry

hwware avatar Apr 30 '25 18:04 hwware

Please update the json file (version to 9.0) then the maintainers have chance to merge this PR

@hwware That's great news. I updated the version number and rebased onto the current unstable.

gmbnomis avatar May 01 '25 00:05 gmbnomis