lighthouse icon indicating copy to clipboard operation
lighthouse copied to clipboard

Rework Validator Client fallback mechanism

Open macladson opened this issue 1 year ago • 5 comments

Issue Addressed

#3613

Proposed Changes

  • Reworks large sections of the validator fallback mechanism.
  • Uses health tiers to sort beacon nodes by healthiest first.
  • Considers beacon node sync distance, optimistic status and execution status.

Each connected BN is queried every slot by the VC and is then sorted in a priority list. When attempting to perform duties, the VC will use the healthiest BN first (within a configurable tolerance).

Additional Info

  • There is currently no lockout period. This means if a BN fails a request, it will still be tried next epoch if it is seemingly healthy.
  • Removes the RequireSynced enum. This was scarcely used but may have implications when removed.
  • Removes the OfflineOnFailure enum. This was also scarcely used. In the new system, beacon nodes are never marked as offline so in essence have OfflineOnFailure::No by default.

Remaining Work

  • [x] CLI configurable distance tier modifiers.
  • [x] Actually use the status field on CandidateBeaconNode or remove it entirely.
  • [x] Fix logging since a lot of it was removed and not replaced.
  • [x] Make a /lighthouse/ui endpoint which exposes the fallback health stats for siren.
  • [x] Create working simulator to test fallback mechanism.

macladson avatar Jun 13 '23 05:06 macladson

I've had a read through this and it's looking really neat and tidy! I haven't finished the review yet, I'm bouncing between this review, the upcoming release and the Deneb review. I wanted to share some thoughts whilst they're fresh in my head, though.

I've been thinking about the risk of sorting BNs based on head slot. As we've discussed offline, we risk applying a "highest slot wins"-style fork choice between BNs. I.e., allowing the VC to arbitrarily swap between BNs means that the attached validators are not truly following the fork-choice rule implemented by each individual BN.

My biggest concern is if there is a consensus disagreement between the BNs that a VC connects to. For example, imagine a VC that connects to two BNs; a minority (20% share) EL/CL pair and a majority (80% share) EL/CL pair. The majority pair has a fault that causes all those nodes on the network to follow an invalid chain. Since that majority pair produces far more (invalid) blocks, a VC with a highest-slot-wins rule will tend to vote/build on that invalid chain. This could result in the finalisation of an invalid chain (bad).

I believe the ideal solution to this problem is to detect any consensus faults between the two BNs. If there is a consensus fault, then either (a) always prefer the BN defined first in --beacon-nodes, or (b) halt. Unfortunately, detecting consensus faults between BNs is not trivial. It would likely involve trying the VC trying to replay blocks from one BN back to the other and trying to detect a consensus-fault-y error response. IMO this is asking too much of a VC and is not desirable.

As a work-around, this PR implements the SyncDistanceTier concept. With this concept, we only sort nodes on head-slot based on four categories: Synced, Small, Medium and Large. Presently the Synced tier is set to 8 slots, which means that all BNs with a head slot within 8 slots of the wall-clock slot are considered equal (at least for the head slot metric). I have some thoughts on how we should set the slot value for Synced.

I think that we could make the finalisation of invalid blocks less likely by setting the slot value for Synced to a value that is related to the size of the super-majority client. Going back to our example of an 80/20 majority/minority split, we can expect the majority client to produce (on average) four out of five blocks. Therefore, we would want a Synced value that is at least five since it's reasonable to expect to see runs where the majority client produces four consecutive blocks.

On client diversity.org, we see that Prysm is at 45.15% and Geth is at 47.94%. I'm not sure I trust those numbers, but assuming I do, let's just say that we have a 50% super majority risk. Therefore, we can expect a majority client to produce 1 in every 2 blocks. Therefore, a Synced value of 8 could conceivably protect us from consistently preferring a super-majority client. Rather, we'll be relying on the ordering of the BNs in --beacon-nodes to select our BN which means that having a single majority node in that list should not defeat the diversity of the rest of the list.

So, TL;DR I'm happy with the current approach but I thought it would be useful to get my thoughts written down.

paulhauner avatar Aug 22 '23 01:08 paulhauner

Heyo, I'm back for a review and there's a couple of conflicts. Do you mind resolving them please? 🙏

paulhauner avatar Oct 04 '23 00:10 paulhauner

I have done some manual testing and some comments below:

  1. The fallback to the secondary beacon node works well.

    Regarding the flag, --beacon-nodes-sync-tolerances, does it mean for the case when the execution engine is offline? Because the CLI description does not mention this.

    During testing, when the execution is offline, after exactly 8 slots, the VC will mark the beacon as not synced, with Synced: 2 becomes Synced: 1 (2 BNs were connected during testing). The question here is, if a validator has an attestation within these 8 slots, does it mean that the VC will fail the attestation? As the primary beacon node is still marked as synced, but it is actually no longer able to publish attestation due to offline execution engine.

    Can I know why 8 slots is chosen? It must be some kind of trade-off, but what is the trade-off here? It is possible to make it smaller? Although in any case, if the VC has set --broadcast attestation then this will not matter anymore.

    Also, what is the use case of Small and Medium sync range in the flag?

  2. When the VC is connected to the BN, it shows: INFO Connected to beacon node(s) synced: 2, available: 2, total: 2 When one of the BNs is turned off, it shows: INFO Connected to beacon node(s) synced: 1, available: 1, total: 1

    From my understanding, total = total BNs in the flag --beacon-nodes on the VC; and available = online but not synced. So I thought it should display: INFO Connected to beacon node(s) synced: 1, available: 1, total: 2 (which is also the behaviour of v5.2.1), but does not seem to be the case here.

  3. When the beacon node is starting later than the VC (e.g., start all services at once but BN will fully start later as it takes time to initialize), then the VC is not able to detect that the BNs is online, even though they are synced. VC continuously logs: WARN Unable to connect to a beacon node available: 0, total: 2, retry in: 2 seconds A restart will resolve the issue (since the VC wil then start later than the BN). It looks like the VC is not able to detect that the BN is online if it is online before the BN.

  4. As discussed, the new endpoint fallback_health does not return anything on query: curl -X GET "http://localhost:5062/lighthouse/ui/fallback_health" -H "accept: application/json" -H "Authorization: Bearer api-token-redacted"| jq

Thank you.

chong-he avatar Jul 26 '24 02:07 chong-he