consensus-specs icon indicating copy to clipboard operation
consensus-specs copied to clipboard

Allow honest validators to reorg late blocks

Open michaelsproul opened this issue 3 years ago • 1 comments

Specify a function in fork choice get_proposer_head which returns the parent block upon which a proposer may build while possibly reorging the canonical head (if it was broadcast late and lacks sufficient attestation weight).

The mechanism is designed with safeguards built in so that it does not cause liveness failures:

  • Only single-slot reorgs are attempted: block N + 2 may build on block N if block N + 1 arrived late.
  • Participation must be approximately greater than PARTICIPATION_THRESHOLD = 70%.
  • REORG_WEIGHT_THRESHOLD is set to 20% (half of the proposer boost) so that an attacker must control up to PROPOSER_BOOST_SCORE - REORG_WEIGHT_THRESHOLD = 20% of attester weight in order to perform an ex-ante reorg. The attack works by releasing attestations for the block at slot N + 1 around the same time as the honest proposer publishes the attempted reorg block at N + 2. The late attestations tip the weight of the N + 1 block from < REORG_WEIGHT_THRESHOLD to > PROPOSER_BOOST_SCORE, meaning that block N + 1 will be preferred to block N + 2.

For background on the development of this idea, see: https://github.com/sigp/lighthouse/pull/2860.

Pros

  • Completely optional for clients and node operators, can be switched on and off via a flag. Low risk.
  • Can be rolled out almost immediately unlike other late block mitigations which are still in R&D.

Cons

  • Considerable complexity, particularly regarding the overriding of fork choice updates for the canonical head.
  • Obsoleted by (block, slot) fork choice. If a viable implementation of (block, slot) fork choice emerges then reorging late blocks will become the default behaviour.
  • Relies on proposer boost, which may be phased out in favour of other fork choice changes (e.g. view-merge).

Thanks to @potuz @mkalinin @terencechain @fradamt @casparschwa @djrtwo for feedback on this idea.

michaelsproul avatar Oct 13 '22 06:10 michaelsproul

Since this is optional, it might be nice to separate it from other parts of the spec a bit more, similar to optimistic sync. We could contain the complexity a bit, especially if it might be phased out in the future.

realbigsean avatar Oct 18 '22 16:10 realbigsean

What is the circuit-breaker mechanism for when latency is > 4s? Is the intention that checking "single_slot_reorg" and "participation_ok" should act as one, because, when latency is > 4s and blocks do not get attestations in their slot, the parent block would either be too old or not have enough weight?

If that's the case, I think this should be clarified. It might also be nice to have an explicit circuit-breaker that's easier to reason about, and gives clear liveness guarantees (e.g. explicitly turn off the reorg functionality whenever too many proposals are being missed)

fradamt avatar Nov 10 '22 21:11 fradamt

@fradamt Yeah the single slot & participation conditions are meant to be the circuit-breaker for asynchrony (although I think the single slot condition alone is sufficient). I think the outcome is that with message delay $\text{4s} < \Delta < \text{16s}$ then every second block will be re-orged and we'll get 50% of blocks on average. Once the delay exceeds 16s then blocks will be forced to build on a parent from at least 2 slots prior and we'll get <50% of blocks on average, without the proposer re-org rules coming into effect at all. In other words the only change in behaviour is for the delay window $\text{4s} < \Delta < \text{16s}$.

It might also be nice to have an explicit circuit-breaker that's easier to reason about, and gives clear liveness guarantees (e.g. explicitly turn off the reorg functionality whenever too many proposals are being missed)

Are you thinking of something like the builder API circuit-breaker conditions about number of missed blocks in the last N blocks and number of epochs since finalization? We could add number-of-skipped-blocks check (while keeping single_slot_reorg), and replace the current participation check with the the epochs-since-finalization check? The main reason I'd like to keep the single_slot_reorg check is that it simplifies the implementation greatly in terms of calculating and caching things.

michaelsproul avatar Nov 11 '22 00:11 michaelsproul

I've rebased this on dev and addressed some of the outstanding review comments. I think there are still several decisions we need to take before merging:

  • Q1: Is the current location alongside the other fork choice functions the right location for these specs? As they're optional they could alternatively live in ./fork_choice alongside the safe block specs.
  • Q2: How should we address the tension between specifying a precise strategy, and giving implementations the freedom to explore more aggressive strategies? This has implications for how the spec is written, which tests we write, and what inferences we are able to make about re-orgs on live networks.
    • A: max precision: Don't account for any deviations from the spec. Write test cases that enforce when clients should and shouldn't re-org (positive and negative tests, respectively).
    • B: max freedom: Allow clients to deviate from the spec in any way they wish. Write some test cases that describe minimum scenarios in which clients should re-org, but don't specify any negative tests beyond very extreme cases where re-orgs are actually impossible.
    • C: balance: Account for some acceptable deviations from the spec in tests. Write a mix of positive and negative tests while trying to avoid negative tests that would fail on current implementations. Allow implementations to skip some tests if they explore more exotic strategies in future.

I think my current preference for Q2 is option (C). I think the only outstanding difference between Prysm and Lighthouse is our handling of unrealized justification. Lighthouse currently won't re-org a block that updates unrealized justification, while Prysm will, because it assumes it can achieve equivalent justification by packing the block's attestations. ~~I'm not opposed to changing Lighthouse to follow Prysm and specifying this as a case where we should re-org.~~ I think Prysm's strategy is unsafe in general, so would prefer to keep the finalization check in the spec.

michaelsproul avatar Apr 19 '23 06:04 michaelsproul

Given https://ethresear.ch/t/selfish-mining-in-pos/15551/3 I suggest we add a constant REORG_PARENT_WEIGHT_THRESHOLD with suggested value of uint64(160) and an extra condition

    parent_weight = get_weight(store, parent_root)
    parent_reorg_threshold = calculate_committee_fraction(justified_state, REORG_PARENT_WEIGHT_THRESHOLD)
    parent_strong = parent_weight > parent_reorg_threshold

And require parent_strong = true to reorg.

potuz avatar May 17 '23 17:05 potuz

I am starting to get more worried about this change -- the reason is that it implements a version of the block/slot fork choice, but without the mitigations that would have to be in place to avoid reorging under poor networking conditions.

Also, we should probably open a discussion about the 4s attestation timeline at this point. In order for attestations, we need (1) block propagation and (2) block validation all to happen in this time slot, which is starting to become more work (e.g. with 4844) and does not feel like the appropriate slot division anymore since the work in the other three thirds remains the same (and is overall less time critical).

dankrad avatar May 31 '23 10:05 dankrad

without the mitigations that would have to be in place to avoid reorging under poor networking conditions.

There is a constraint on the slot of a parent block as a mitigation to poor network connections, i.e. parent_block.slot + 1 == head_block.slot. If a block was missed in a parent slot then no reorg is enforced.

Also, we should probably open a discussion about the 4s attestation timeline at this point.

I think we should raise it on one of the next calls. It would be nice to see the distribution of attestation and aggregates submitting over time into their respective slot phases.

mkalinin avatar May 31 '23 10:05 mkalinin

There is a constraint on the slot of a parent block as a mitigation to poor network connections, i.e. parent_block.slot + 1 == head_block.slot. If a block was missed in a parent slot then no reorg is enforced.

Yes, but that makes any slot whose parent was not reorged subject to a reorg. Seems pretty harsh!

I think we should raise it on one of the next calls. It would be nice to see the distribution of attestation and aggregates submitting over time into their respective slot phases.

Yeah, it's a good idea, I will check if we can get this data

dankrad avatar May 31 '23 14:05 dankrad

I've pushed the change suggested by @potuz to address the "selfish mining" attack described here.

I would appreciate some help from @hwwhww writing tests. Alternatively we could merge as-is and add tests in a follow-up PR.

michaelsproul avatar Sep 12 '23 06:09 michaelsproul

In addition to my previous reservations on this change, I also have a new one:

A fork choice change like this should not be taken light-heartedly. In fact, we explicitly don't want that people think they can just mess with the fork choice. As an example, if someone built a bribery protocol to reorg blocks intentionally, I would consider this an attack on Ethereum.

Therefore it might be best to actually roll back the change from clients now and re-start the discussion to make sure this is in a public forum and agreed by everyone, at least on ACDC.

dankrad avatar Oct 19 '23 22:10 dankrad

This was widely discussed in ACD at the time and thoroughly agreed upon. The only reason this wasn't merged then is that we couldn't make public some aspects of pulled tips. The implementation complexity of these changes are deeply intricate to roll back lightly

Edit: we do have this behind a flag so we could easily revert that flag though

potuz avatar Oct 19 '23 22:10 potuz

@dankrad I was keen to push ahead with this fix, because as Potuz said it had fairly broad support, but it also mitigated what I saw as a security issue introduced by MEV at the merge. Without late block re-orgs, proposers are incentivised to wait until the last moment to reveal their blocks, e.g. 10s into the slot. I think this damages the network more than intentionally re-orging blocks that are late.

Another thing is that this change does not modify fork choice, as in the part of fork choice that determines what the head should be. It's just an (incentive-compatible) tweak to the rules for choosing which block to build upon. Unlike bribe-based re-orgs, it also reinforces the honest behaviour of other validators: publishing blocks on time, and publishing attestations at or before the attestation deadline.

Like Prysm, Lighthouse has this feature behind a flag and could disable it by default (or permanently) if desired.

michaelsproul avatar Oct 19 '23 23:10 michaelsproul

Hi @michaelsproul

I refactored the specs to re-arrange and reduce the duplicate code in get_proposer_head and should_override_forkchoice_update. It helps me to define the test cases. What do you think about this refactoring?

  • Q1: Is the current location alongside the other fork choice functions the right location for these specs? As they're optional they could alternatively live in ./fork_choice alongside the safe block specs.

It reminds me of the confirmation rule PR, which defines the new feature in a new fork_choice/confirmation-rule.md file. I'm not 100% sure if adding a new doc is necessary, but it would be better to clarify which parts are "optional" and which are "honest validator fork choice rule".

If should_override_forkchoice_update returns True but get_proposer_head later chooses the canonical head rather than its parent, then this is a misprediction that will cause the node to construct a payload with less notice. The result of get_proposer_head MUST be honored in preference to the heuristic method.

I saw that get_proposer_head is optional in phase0 FC, but here, it's not clear to me if should_override_forkchoice_update is MUST to enable. Could you clarify a bit?

  • Q2: How should we address the tension between specifying a precise strategy, and giving implementations the freedom to explore more aggressive strategies? This has implications for how the spec is written, which tests we write, and what inferences we are able to make about re-orgs on live networks.

IMO clients can have the flexibility to implement it into a more efficient algorithm, but in regard to the result of the computed head, it should be identical because of the risk of chain split. I doubt spec tests have covered everything. 😅

hwwhww avatar Oct 20 '23 07:10 hwwhww

Another thing is that this change does not modify fork choice, as in the part of fork choice that determines what the head should be.

It modifies the rules on how to vote for a block. This is what a bribery protocol to e.g. censor certain transactions permanently would do, too.

It's just an (incentive-compatible) tweak to the rules for choosing which block to build upon. Unlike bribe-based re-orgs, it also reinforces the honest behaviour of other validators: publishing blocks on time, and publishing attestations at or before the attestation deadline.

Again, I'm not so much against this change, but now feel that it needs broader agreement and is not something that clients should be able to implement before that is reached. Maybe this is something to at least discuss on next ACD if people generally agree with this change.

dankrad avatar Oct 20 '23 12:10 dankrad

FWIW, we have so far not implemented this in Nimbus (yet), for the simple reason it's not part of the spec.

+1 that having things like this behind flags dubious at best and should be considered a temporary solution to address urgent problems and not a permanent state.

arnetheduck avatar Oct 20 '23 13:10 arnetheduck

It modifies the rules on how to vote for a block. This is what a bribery protocol to e.g. censor certain transactions permanently would do, too.

Edit: ah I see by voting you mean the vote the proposer is casting by asserting his head was the previous one. Will leave this comment in case someone else also took "attesting" for the meaning of "voting"

I don't see how this modifies voting rules:: attesters in the orphaned committee would not have attested for the late block cause they sent attestations at 4 seconds. Attesters in the next committee are following the usual forkchoice and proposer boost mandates them to vote for the reorging block.

potuz avatar Oct 23 '23 09:10 potuz

having things like this behind flags dubious at best and should be considered a temporary solution to address urgent problems and not a permanent state.

Just to clarify this, the only reason Prysm has this behind a flag is to protect from an implementation bug that can cause a liveness issue. We do this with most changes that touch core functions like block production. This has nothing to do with the feature being or not merged in the spec. As far as we were concerned this was thoroughly discussed and agreed in ACD albeit some reservations about disclosing because of certain forkchoice attacks that's were present at the time

potuz avatar Oct 23 '23 09:10 potuz

Hi @hwwhww, sorry for the slow reply! Thanks for cleaning this up and adding tests, I really appreciate that :blush:

I saw that get_proposer_head is optional in phase0 FC, but here, it's not clear to me if should_override_forkchoice_update is MUST to enable. Could you clarify a bit?

I tried to clarify the wording of this section in 3f1bc2051294d7ea632b5010500f98a331119d39. Both functions are optional, and should_override_forkchoice_update should only be enabled when the proposer reorg feature is enabled. This is how Lighthouse and Prysm work currently.

check if we should output get_proposer_head result. It is labeled optional so I hesitated. also, consider moving get_proposer_head to a separate place

I think the tests should include get_proposer_head as well, as it is arguably more important than should_override_forkchoice_update. If clients don't implement it they can simply skip those checks in their fork choice test runner.

I also think we should just leave it in the current location, as it seems like we may want to make this feature mandatory in future (Potuz mentioned that several PBS designs rely on it, but I haven't dug deeper).

confirm if test format is ok and run with a client

I'll try running the test vectors with Lighthouse soon. Maybe once they also include get_proposer_head checks.

michaelsproul avatar Oct 26 '23 06:10 michaelsproul

@michaelsproul thank you!

I updated the test format: https://github.com/ethereum/consensus-specs/blob/d8440f8bb4496d053a36b5fef64420d5149156bb/tests/formats/fork_choice/README.md#checks-step

  1. add get_proposer_head
  2. add mocking validator_is_connected into should_override_forkchoice_update check.

Now running the fork-choice testgen.

hwwhww avatar Oct 26 '23 16:10 hwwhww

@michaelsproul

FC-only test vectors of commit d8440f8

hwwhww avatar Oct 26 '23 17:10 hwwhww

FC-only test vectors of commit https://github.com/ethereum/consensus-specs/commit/d8440f8bb4496d053a36b5fef64420d5149156bb

Tests are passing on Lighthouse (https://github.com/sigp/lighthouse/pull/4887). I expected at least one test to fail because Lighthouse doesn't have the REORG_PARENT_WEIGHT_THRESHOLD check. This implies the test coverage could be expanded with a case where the reorg would go ahead if it weren't for the low weight of the parent (i.e. where head_weight < REORG_WEIGHT_THRESHOLD && parent_weight < REORG_PARENT_WEIGHT_THRESHOLD).

michaelsproul avatar Oct 27 '23 00:10 michaelsproul