flow-go
flow-go copied to clipboard
Networking layer's ProtocolStateIDCache accounts for ejection events
Problem Definition
With the Dynamic Protocol state, we now have the ability to eject nodes at any block within the epoch. However, the networking layer is still oblivious about any potential ejections until the new epoch starts.
Proposed Solution
I would suggest adding a notification that is emitted by the protocol state and distributed via the events.Distributor
. The ProtocolStateIDCache
can listen to this event and update its internally-maintained notion of "latest authorized network participants" (for brevity, I will call this the latest identity table
).
Details
while we are at it, I would really like to get the following TODO cleaned up: https://github.com/onflow/flow-go/blob/c54e15a6e3754bd6bc255bf115dcc0bcc6d06857/network/p2p/cache/protocol_state_provider.go#L69-L71 this violation of API conventions (potentially blocking the hotPath of consensus) is inherent to all notifications the ProtocolStateIDCache
ingests. Note that the ProtocolStateIDCache
only needs to be eventually consistent (👉 documentation). This allows a subtle but very compact implementation:
-
introduce auxiliary struct encapsulating the 'latest identity table'
- extract the following fields into this struct: https://github.com/onflow/flow-go/blob/c54e15a6e3754bd6bc255bf115dcc0bcc6d06857/network/p2p/cache/protocol_state_provider.go#L34-L36
- add the view that defined this 'latest identity table' to the struct
-
Per convention, an instance of this 'latest identity table' struct is immutable. The
ProtocolStateIDCache
maintains an atomic pointer to the latest instance of this struct. The getter methods just get the pointer to the 'latest identity table', making reads entirely lock-free. -
When updating the 'latest identity table' (method
update
), we would instantiate a new 'latest identity table' struct and atomically update the pointer. Thereby, also the writes are completely lock free. -
We desire that the update runs in its own thread. We could introduce a queue for the inbound notifications, but that is needlessly complex in my opinion for the following reason:
- We only care about the latest identity table. Hence, a
StrictMonotonousCounter
is sufficient that tracks the latest height that changed the identity table. - All events (
EpochTransition
,EpochSetupPhaseStarted
,EpochCommittedPhaseStarted
, and the new NodeEjected) all provide the block header, whose finalization resulted in the latest identity table changing. Since all of these events are emitted on finalization, we can query the identity table by height, which the block header provides. - Whenever a new event comes in, we update the
StrictMonotonousCounter
in theProtocolStateIDCache
and trigger a notifier. - a worker thread listens to the notifier and updates the latest identity table when it sees a notification
The
TimeoutAggregator
is a great example of this pattern (though it uses a view, while I am suggesting to use the height here). Inbound notifications execute this logic, where only the newest value is retained. The worker thread executed that logic which guarantees that eventually the latest value will be processed. - We only care about the latest identity table. Hence, a
-
repetitions and outdated events are no concern, because the
StrictMonotonousCounter
will only ever let newer updates pass
Current functional overlap with IdentityDeltas
There is overlap between the ProtocolStateIDCache
and the IdentityDeltas
. We should take this opportunity to consolidate them (remove one). Alternatively we would need to update both to support ejections.
Definition of Done
Verifying that an ejected node is no longer exposed through the ProtocolStateIDCache
's methods GetFlowID
, GetPeerID,
ByPeerID,
ByNodeIDor
Identitiesshould be sufficient. I think we have an integration test that verifies that connections from nodes that aren't exposed by the
ProtocolStateIDCache` are dropped/refused.