Allow recovering from some assertions
Consider an assertion such as
https://github.com/etcd-io/raft/blob/d9907d6ac6baaebc3c9fd4e67acaa4154d2b3cd3/log.go#L324
of which there are various across the codebase.
Hitting this panic usually means that a follower has not upheld its durability guarantees.
Violating invariants like these is not great, but crashing might not be the user's first choice here. An app might prefer to keep going despite some amount of risk that a write was lost (which often it won't have been).
The way I would structure this is by introducing event-based logging:
Instead of a line like this
https://github.com/etcd-io/raft/blob/d9907d6ac6baaebc3c9fd4e67acaa4154d2b3cd3/log.go#L324
We'd have something like this (total strawman just to get the idea across)
l.logger.Event(&CommitOutOfRangeIndex{Commit: tocommit, LastIndex: l.lastIndex()})
// Code here to actually handle the problem gracefully
...
where the default logger would panic but users could define a logger that would just log the event and keep going. We wouldn't have to make all events that are now panics recoverable at first but could allow this only for certain events like the one discussed here.
Extracted from https://github.com/etcd-io/raft/issues/25#issuecomment-1449055381_
Note that while "help is wanted" here I don't have bandwidth to shepherd a pull request from humble beginnings to the end. Unless another maintainer steps up to "sponsor" this work I'll only be able to accept contributions that are "close enough" to a solution that passes the bar: good design, testing, sensibly documented, backwards compatible. This will be difficult for casual or even first-time contributors.
I am trying to solve the commit index regression problem first. I am now writing an interaction test to reproduce the problem.