Configuration change validation has false positives

Open tbg opened this issue 2 years ago • 1 comments

We currently validate configuration changes at proposal time

https://github.com/etcd-io/raft/blob/4abd9e927c6d5db930dfdb80237ac584449aeec7/raft.go#L1224-L1260

This is not very helpful because it has false positives (refusing a config change that is actually allowed) though at least it currently doesn't have false negatives (because the set of false positives is sufficiently large 😄)

It could be more effective to either compare against the actual most recent config change in the (including the unstable) log, or to move the verification to apply time (i.e. erroring out conf changes that are illegal).

See https://github.com/etcd-io/raft/pull/81.

Jul 10 '23 14:07 tbg

in #81 (which adds a DisableConfChangeValidation option), @pavelkalinnikov says:

I think we should fix this lag in raft more fundamentally, in such a way that the DisableConfChangeValidation option is not needed. This mechanism should just not be best-effort, and should not have false positives/negatives. Config changes need some reconsideration.

Relying solely on above-raft config change mechanisms (such as in CRDB) also smells like a footgun. Config changes seem fundamentally belonging to raft layer, at least w.r.t. correctness of quorum rules etc.

Currently we do a collection of tricks which makes correctness reasoning hard:

scan the log to see if we have an unapplied config
guess if we have a pending config change
silently nullify incorrect config changes, or silently accept them (with the DisableConfChangeValidation option on).

Instead, I think we should just precisely track and know the current and prospective config from the explicit state, and explicitly error out on incorrect transitions. I.e., when we commit a pending config change, it becomes part of the state. To do so, we would need to decouple config changes from logs (otherwise, again, we would have to have a "scan the log" step to find the next config, e.g. when entries are appended or the commit index is advanced).

More generally, it would be nice to have separate state machines: raft itself, and the application-level state machine that it supports.

There are more arguments for this decoupling:

Config changes are the only entries that raft needs to see to operate correctly. All other log entries (which often include user data etc) don't need to go through raft package, it's enough to communicate metadata such as term/index of stored entries.
The application-level commands thus don't have to be embedded into and carried with the protobufs format that raft package imposes.
The application level thus doesn't have to demultiplex raft config changes and its own logs.
Performance reason. Say, there is a leader and 2 slow followers who are close to stall. It is nice to be able to relocate the two followers to other nodes without waiting for them to catch up. Currently we have to commit a config change through this slow quorum's log, but with decoupling it should be possible to do it bypassing the log, with a "higher priority".
raft doesn't look at application-level commands (which is most of them), but plays disproportionate role in managing resources and driving the flow of application-level entries: #64. The control should be at the application end. Having app-level commands interleaved with config changes complicates this control though, but decoupling should help.

Jul 12 '23 06:07 tbg