redpanda r/consensus: do not require learner promotion to leave joint consensus

Cover letter

In redpanda Raft implementation reconfiguration cancellation is done by reversing the direction of configuration change. When Raft group configuration change is in first phase i.e. new nodes are added as learners to current configuration then they are simply removed. Cancellation of change when reconfiguration entered a Joint state requires swapping old and new configurations in Joint raft group configuration. It may be the case that cancellation will never finish even if only one node is unavailable as the node may be a voter that was demoted to learner in the last step before its removal.

In order to allow the configuration change to finish we allow Raft to leave joint consensus before all learners are promoted to voters. This change is safe as learners does not change the safety guarantees but enables us to reliably cancel partition movement when one of the nodes is down.

Fixes #ISSUE-NUMBER, Fixes #ISSUE-NUMBER, ...

Backport Required

[ ] not a bug fix
[ ] issue does not exist in previous branches
[ ] papercut/not impactful enough to backport
[ ] v22.2.x
[ ] v22.1.x
[ ] v21.11.x

UX changes

Describe in plain language how this PR affects an end-user. What topic flags, configuration flags, command line flags, deprecation policies etc are added/changed.

Release notes

Oct 17 '22 18:10 mmaslankaprv

/ci-repeat 5 debug skip-units dt-repeat=10 tests/rptest/tests/partition_balancer_test.py tests/rptest/tests/partition_move_interruption_test.py

Oct 18 '22 06:10 mmaslankaprv

/ci-repeat 5 debug skip-units dt-repeat=5 tests/rptest/tests/partition_balancer_test.py tests/rptest/tests/partition_move_interruption_test.py

Oct 18 '22 09:10 mmaslankaprv

/ci-repeat 5 debug skip-units dt-repeat=5 tests/rptest/tests/partition_balancer_test.py tests/rptest/tests/partition_move_interruption_test.py

Oct 18 '22 17:10 mmaslankaprv

/ci-repeat 1

Oct 19 '22 09:10 mmaslankaprv

Cancellation of change when reconfiguration entered a Joint state requires swapping old and new configurations in Joint raft group configuration.

@mmaslankaprv can you add a full list of configuration transitions that led to a bug? I'm looking at configuration_change_strategy_v4::cancel_update_in_joint_state at it appears to leave the joint state, no? (i.e. _cfg._old = nullptr after it is done)

Nov 25 '22 13:11 ztlpn

We established that this is not required. Thank you @ztlpn

Nov 25 '22 16:11 mmaslankaprv

The example that we discussed, for posterity:

(1,2,3)->(1,2,4)

init: c: v:(1,2,3), l:() | o: - 

1. C: v: (1,2,3), l:(4) | o:   - transitional
2. C: v: (1,2,3,4) l: () | o:  - transitional
3. C: v: (1,2,4) l: () | o: v:(1,2,3,4), l: () - joint
4. C: v: (1,2,4) l: () | o: v:(1,2,4), l: (3) - joint

cancel @ 1.

2': C: v:(1,2,3), l:() | o: - 

cancel @ 2.

3' C: v: (1,2,3) l: () | o: v:(1,2,3,4), l: () - joint
4' C: v: (1,2,3) l: () | o: v:(1,2,3), l: (4) - joint

cancel @ 3.

4' C:  v:(1,2,3,4), l: () : o: - 
5' C: v: (1,2,3) l: () | o: v:(1,2,3,4), l: () - joint
6' C: v: (1,2,3) l: () | o: v:(1,2,3), l: (4) - joint
7. C: v: (1,2,3) l: () | o:  - simple

cancel @ 4.

4' C:  v:(1,2,4), l: (3) : o: - transitional
5' C: v: (1,2,3,4) l: () | o: - transitional
6' C: v: (1,2,3) l: () | o: v:(1,2,3,4), l: () - joint
7' C: v: (1,2,3) l: () | o: v:(1,2,3), l: (4) - joint
8. C: v: (1,2,3) l: () | o:  - simple

The problem is with the case cancel @ 4. If the node 3 is unavailable at step 4' we are stuck. But it is a transitional configuration, not joint, so the change in the PR doesn't really help.

Nov 25 '22 18:11 ztlpn

redpanda redpanda copied to clipboard

r/consensus: do not require learner promotion to leave joint consensus

Cover letter

Backport Required

UX changes

Release notes

redpanda
redpanda copied to clipboard