consensus-specs icon indicating copy to clipboard operation
consensus-specs copied to clipboard

Deal with non-finalization that spans more than one weak subjectivity period

Open dankrad opened this issue 5 years ago • 5 comments

The weak subjectivity spec currently does not really define this behaviour, leading to client implementations potentially being inconsistent and dangerous . For example, my teku client simply went offline because there was an extended non-finality period: https://github.com/PegaSysEng/teku/issues/3005

As far as I can see, there are two possible behaviours we can specify:

  1. Fully observe WS during non-finality. This would mean that during non-finality, clients should store the epoch current_epoch - compute_weak_subjectivity_period(state) as their WS checkpoint every epoch, and never revert beyond this even if the fork choice rule gives a different result. This should probably also be noted in the fork choice rule.

Then we would need to clarify that more generally, a WS checkpoint is not necessarily (but preferably) a finalized epoch in the WS spec (it currently does not mention it, but I think it is assumed by many to be a finalized epoch).

  1. Do not observe WS during non-finalization. Clarify that a WS checkpoint is always a finalized epoch, and after the WS checkpoint, the fork choice should prevail, even if it means a reversion longer than the WS period.

I would prefer option 2, because:

  • Weak subjectivity periods are less meaningful when the chain is not finalizing -- in particular, no new validators that haven't already been committed to can be added
  • It allows better automatic resolution of chain splits (e.g. geographical) spanning longer periods.

From what I can see, there is only one case where this leads to undesirable behaviour: If 51%+ of validators go offline for a long time, they may then decide they do not like the resulting chain of 49%- of the validators building a chain with their deposits highly diluted, and attack this chain. I consider this situation much less likely than the case of two chains being built during a geographic split.

dankrad avatar Oct 19 '20 16:10 dankrad

  1. Do not observe WS during non-finalization. Clarify that a WS checkpoint is always a finalized epoch, and after the WS checkpoint, the fork choice should prevail, even if it means a reversion longer than the WS period.

I agree with this option. This is the expected behavior as per the current WS spec.

Clients teams should note that the current WS sync only concerns itself with WS sync when the client is started. Unless explicitly mentioned in the spec, do not implement any additional WS behavior, as this may lead to fork choice deadlocks, client sync failures, or other misc. issues.

Advanced WS behavior is a topic of discussion and will be included in the WS spec when finalized.

adiasg avatar Oct 20 '20 02:10 adiasg

I think this would need some clarification. I'm happy to create a PR of what's needed in my opinion.

dankrad avatar Oct 20 '20 08:10 dankrad

Clarify that a WS checkpoint is always a finalized epoch, and after the WS checkpoint, the fork choice should prevail, even if it means a reversion longer than the WS period.

This is unsafe due to the same reasons it's unsafe to sync from outside of the WS period. If non-finality is longer than WS period, then a minority attacker can construct an alternate chain where they have become the majority and begin to finalize. If you can re-org deeper than WS period during non-finality, then you could reorg to such a chain that an attacker constructed for "free"

djrtwo avatar Oct 26 '20 18:10 djrtwo

This is unsafe due to the same reasons it's unsafe to sync from outside of the WS period.

Well, I argue it's not unsafe because there are no safety guarantees while chains aren't finalized.

But if you want to make the "safe" behaviour what you're suggesting, then to be fully consistent you should be in favour of option 1., which means even clients that are online for the whole period will make WS fallback checkpoints beyond which they would not revert?

dankrad avatar Oct 27 '20 22:10 dankrad

Clarify that a WS checkpoint is always a finalized epoch

one difficulty with this approach is that it basically invalidates the finalized_checkpoint field in the state object - if I'm looking a head, I now can no longer trust that the state contains canonical information about what the finalized checkpoint is, and I need to go out-of-band to fetch it - in the case where the user supplied one, this is feasible, but we start automatically checkpointing weak subjectivity, it will create communication difficulties between clients, explorers etc - the way to move forward with option 1 would be to modify the state transition function to update finalized checkpoint, rather than an "implementers recomendation".

arnetheduck avatar Oct 30 '20 11:10 arnetheduck

I am closing this issue because it seems stale. Please, do not hesitate to reopen it if this is a mistake

leolara avatar Jun 04 '25 09:06 leolara