flow-go icon indicating copy to clipboard operation
flow-go copied to clipboard

[EFM] Consecutive epoch counters code audit / investigation

Open kc1116 opened this issue 10 months ago • 4 comments

Context

~Some components that process epoch related events and data may assume that we will always have a consecutive epoch counter example. This assumption will be broken when we introduce epoch extensions, we need to audit the code to determine the places where we need to handle this broken assumption.~

Update: We have decided instead to require consecutive epoch counters in the protocol state and allow non-consecutive counters in the smart contract (see conversations below).

kc1116 avatar Apr 18 '24 18:04 kc1116

I would suggest to have a detailed conversation on this. On a high level, what we need is that System Smart Contracts and Protocol State are aligned. We can achieve this in different ways:

  1. Giving up on the convention that epoch counters have to be consecutive for the Protocol State.
    • It should be notes that the Protocol States describes the de-facto reality.
  2. Giving up on the convention that epoch counters strictly monotonously increase consecutive for the System Smart Contracts
    • The system System Smart Contracts do not necessarily describe the reality of the network. They describe how the network should be behaving when following Happy Path Epoch Transitions

The fundamental design decision we have to make is whether we want to (i) count the Epochs that actually happened in the network or (ii) count the epochs that an idealized system would have made. I think that (i) is much more sensible compared to (ii), for the following reasons:

  • It is likely that we have to manually align the state of the System Smart Contracts what the reality of the Protocol State. For example, I would assume that we don't want to pay rewards to nodes, whose epoch didn't happen, while it is important to pay more rewards to nodes that wanted to unstake but had to serve longer due to epoch extensions (the last thing we want is node operators turning off their nodes while the network is in EFM and still needs those nodes to participate).
  • Given that there are going to be discontinuities in the Epoch Smart Contract's state due to manual overrides (recovery transaction), I think that counting Happy Path Epoch Transitions is not the most reliable reference frame. Smart contracts that ingest the Epoch Smart Contract most likely have to understand the possibility for manual adjustments and state discontinuities. I expect that pretending that the Epoch Smart Contract is always the correct source of truth will simply not be viable in many applications.
  • However, note that counting the Epochs as they are defined by the Protocol State yields a useful reference frame:
    • it describes the epoch progression of the real-world system (hence it provides the correct reference frame for nodes' participation rights, computing rewards, ejection, slashing etc).
    • In some way, the epoch state follows a strictly defined algorithmic epoch succession, based on external inputs. Specifically, the Protocol State does not accept arbitrary manual overrides, but only extensions to its current state according to well-specified protocol rules. There are no breaking changes in its state.

Therefore, I think best long-term direction is to adjust the Epoch Smart Contracts to have the same detailed understanding of epochs as the Protocol State. After all, the Protocol State implements the full protocol specification, while in their current form the Epoch Smart Contracts only maintain a "coarse approximation" of the full Epoch state, that is hand-waved to be "correct enough" for our current needs. We have very strong indications that the Epoch Smart Contracts' "coarse approximation" of the Epoch state is not sufficient for the mature protocol. For example, I think a node that unstaked in the current epoch cannot be slashed for a violation that it committed right before the epoch transition, because the Epoch smart contract does not know this node anymore (because it is not actively participating in the current epoch). Similarly, for performance-based node rewards, I believe the Epoch Smart Contract needs a more refined representation of Epochs (including maintaining information from at least the past epoch).

AlexHentschel avatar Apr 23 '24 18:04 AlexHentschel

sync May 21, 2024

Jordan, Yurii, Khalil, Alex

Summary

  • Epoch counter in Dynamic Protocol should be consecutive

  • Need to ensure that nodes that want to unstake don't get their stake back until EFM is over + 1 epoch

  • Expose Dynamic Protocol State to Cadence Smart contract environment:

    • Epoch Phase
    • Epoch Counter
    • EFM status?

    Thereby:

    • Enable staking, stake returns / unbonding when we enter staking phase (nodes stopping their participation at end of epoch N-2 will get stake back when we enter the staking phase in Epoch N (staking for Epoch N+1)).
    • IncrementFi can use this information for their liquid staking.

AlexHentschel avatar May 21 '24 18:05 AlexHentschel

[Update] talked with Dete; summarizing relevant aspects:

  • Dete is suggesting a subtle but important clarification of what it means for a node to stake for an epoch:
    • The node is promising to participate until the next regular epoch transition occurs (including a grace period thereafter depending on node type). On the happy path we have an estimate how long they have to keep their node up (1 week), but that is not guaranteed. In all cases, the contract with the node operator is that the node participates until the next happy-path epoch has been reached. If the node operator is not holding up its end of the agreement, they might get diminished rewards and/or their stake may be slashed.
    • We should make it very apparent that this includes epoch extensions and the recovery epoch in case of EFM.
    • Nodes are always payed at the end of the epoch, where the expected payment for an honest node is proportional to the duration of the epoch. (honest here implies also responsive)

AlexHentschel avatar May 22 '24 21:05 AlexHentschel

Hoping to summarize the above comments and create some actions for this issue:

Must-Have for EFM Recovery

  • [ ] Allow non-consecutive epoch counters in the smart contract, including a new epoch counter creating a gap and a new epoch counter overwriting an existing one.
    • Need to flag this as a potentially breaking change to partners consuming this data and run it by Josh
    • General suggestion is to make available a "reward payment counter" which clients can use instead which the smart contract fully owns and can guarantee is consecutive

Desirable but optional for EFM Recovery

  • [ ] Expose Protocol State data to FVM (epoch counter, phase, EFM status)
  • [ ] Modify smart contract to, where possible, read protocol state from FVM API (above) rather than deducing it independently
    • Currently the Smart Contract and Protocol State independently determine some pieces of state, generally by observing the view. As part of the general design decision that the epoch counter, phase, etc. are owned by the Protocol State, the Smart Contract should read this data rather than determining it independently.
  • [ ] Support more flexible reward payouts in smart contract
    • Currently rewards are always paid out assuming a weekly cadence, however if EFM is entered and epoch can last longer (and may not be an integral number of weeks)

jordanschalm avatar Jun 17 '24 18:06 jordanschalm

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 18 '24 01:09 github-actions[bot]