flow-go
flow-go copied to clipboard
[EFM | M2] Epoch State Machines should not use `parentState` in their business logic
Problem Definition
We have different state machines for orchestrating Epoch transitions in the package state/protocol/protocol_state/epochs
. The current implementation use the parentState
in their business logic in numerous places.
In my opinion, this is extremely error prone. To the best of my understanding, the current code has at least one security-critical bug related to this here. Specifically, buildNextEpochActiveParticipants
has a check to enforce that already ejected nodes are not re-admitted. For the current logic, two service events eject node X
followed by a Epoch Setup
in the same block could be a blind spot for this check. Because we always compare to the parent state (where ejection has not jet been applied). The current code would accept an Epoch Setup
, where node X
is still listed as a participant and override the prior ejection.
In summary, when reviewing the state machines in the package epochs
there are various places where I could not convince myself of the correctness of the code, because the parent state was just used without clear documentation why this is safe.
Context
I think this design flaw is in part originating from our separation of ProtocolStateEntry
and RichProtocolStateEntry
:
- While the parent state is available as
RichProtocolStateEntry
, the evolved protocol state is only maintained asProtocolStateEntry
.- This is on purpose, because we don't want to deal with the complexity of also maintaining the
CurrentEpochIdentityTable
andNextEpochIdentityTable
throughout the epoch state evolution. As those fields are part of theRichProtocolStateEntry
, we only work with theProtocolStateEntry
when evolving the state. - Unfortunately, this has the undesired side effect that the evolving state has reference to the full
EpochSetup
andEpochCommit
events. Whenever we need those, we need to go back to the parent state. And this is where the confusing mix of using the evolving state as well as the unevolved parents originates in my opinion.
- This is on purpose, because we don't want to deal with the complexity of also maintaining the
- Furthermore, we have various service methods that work on the
RichProtocolStateEntry
, because they want the service events though don't care about the Identity tables. Thereby we have further instances, where we might erroneously use theparentState
instead of the updatedstate
Goal
The goal of this issue is three-fold:
- I would be inclined to remove as many usages of
parentState
as possible. In my opinion, the only place should be the state machine constructor, where we copy the parent. - In all remaining places where we cannot avoid using
parentState
, we must include a detailed comment explaining why this is safe even in the case of prior ejection events sealed in the current block. - include tests for the
HappyPathStateMachine
as well as theFallbackStateMachine
that confirm the correct behaviour in the following case:- we have one block sealing two service events:
eject node X
followed by aEpoch Setup
-
Epoch Setup
lists nodeX
participant, which would illegally override the prior ejection. As the smart contract emitted the ejection event before, it should account for this and not includeX
in the setup. - Encountering such a service event should trigger EFM.
- we have one block sealing two service events:
suggestion
I would suggest to add an interim struct between ProtocolStateEntry
and RichProtocolStateEntry
. In more detail:
- rename
ProtocolStateEntry
toEpochMinStateEntry
- add struct
EpochStateEntry
:type EpochStateEntry struct { *EpochMinStateEntry PreviousEpochSetup *EpochSetup PreviousEpochCommit *EpochCommit CurrentEpochSetup *EpochSetup CurrentEpochCommit *EpochCommit NextEpochSetup *EpochSetup NextEpochCommit *EpochCommit }
- rename
RichProtocolStateEntry
toEpochRichStateEntry
and base it onEpochStateEntry
type EpochRichStateEntry struct { EpochStateEntry CurrentEpochIdentityTable IdentityList NextEpochIdentityTable IdentityList }
- Most of the existing code will only care about
EpochMinStateEntry
andEpochRichStateEntry
(as before) - However, code that needs the
EpochMinStateEntry
plus the actual service events would useEpochStateEntry
. In particular, this would be the state machines. Thereby, we can cleanly avoid any usage of theparentState
in the logic that processes service events.