mev-boost 502 During Validator Registration with an Exited Validator

Hi all, I'm the Rocket Pool integration lead in charge of plugging MEV-boost into our Smartnode stack. I have a machine with a few dozen validators on Goerli/Prater, some of which have formally exited the network but their keys are still loaded by the VC.

During the validator registration routine between the MEV-boost client and my BN (Nimbus v22.7.0), I see the following logs:

mev-boost_1      | time="2022-08-12T00:08:00Z" level=info msg="http: GET /eth/v1/builder/status 200" duration=0.172181701 method=GET module=service path=/eth/v1/builder/status status=200
mev-boost_1      | time="2022-08-12T00:08:00Z" level=warning msg="error calling registerValidator on relay" error="HTTP error response: 400 / {\"code\":400,\"message\":\"not a known validator: 0xb440a2621abcda9f2af03b31040d62cd5ea26a9aff16d09003c1cb566fc45da1b67924384375c460a0bf1b3c187ec9b1\"}\n" method=registerValidator module=service numRegistrations=36 ua="nim-presto/0.0.3 (arm64/linux)" url="https://builder-relay-goerli.flashbots.net/eth/v1/builder/validators?id=rocketpool"
mev-boost_1      | time="2022-08-12T00:08:00Z" level=info msg="http: POST /eth/v1/builder/validators 502" duration=0.117795912 method=POST module=service path=/eth/v1/builder/validators status=502

This same set of logs appears during every registration.

Here's what Nimbus reports (different timestamp, but same error):

eth2_1           | INF 2022-08-12 04:43:12.000+00:00 Slot start                                 topics="beacnde" slot=3647616 epoch=113988 sync=synced peers=20 head=97449245:3647615 finalized=113985:0e76a2b3 delay=614us863ns
eth2_1           | WRN 2022-08-12 04:43:12.385+00:00 registerValidators: Couldn't register validator with MEV builder topics="beacval" registerValidatorResult="(status: 502, contentType: \"application/json\", data: @[123, 34, 99, 111, 100, 101, 34, 58, 53, 48, 50, 44, 34, 109, 101, 115, 115, 97, 103, 101, 34, 58, 34, 110, 111, 32, 115, 117, 99, 99, 101, 115, 115, 102, 117, 108, 32, 114, 101, 108, 97, 121, 32, 114, 101, 115, 112, 111, 110, 115, 101, 34, 125, 10])"

Here is the offending validator, which was exited back in December: https://prater.beaconcha.in/validator/0xb440a2621abcda9f2af03b31040d62cd5ea26a9aff16d09003c1cb566fc45da1b67924384375c460a0bf1b3c187ec9b1

I don't know if this is preventing me from registering properly and proposing MEV-boost blocks as I haven't received any yet, but I'll report back when I get a slot.

Aug 12 '22 05:08 jclapis

@Ruteri we could allow exited validators by adding that status to the getAllValidators function in the beacon client on the relay.

https://github.com/flashbots/boost-relay/issues/54

Aug 12 '22 07:08 metachris

We could, but is it really what we should do? Exited validators will never propose blocks, so the best thing would to not have them registered. The builder spec clearly states active or pending. @jclapis I'd propose to chase Nimbus to not send exited validators as the solution. Temporarily we could allow this in our relay, but temporary fixes have a way of causing further issues and confusion. I don't think we should adjust the spec to allow exited validators. cc @metachris @ralexstokes

Aug 12 '22 09:08 Ruteri

@jclapis how annoying is this for you? Do you have any workaround you can apply while the clients fix it?

We could temporarily patch it next week, as @Ruteri said, but quick patches are ugly, as @Ruteri said :)

Have you reported it in the clients repositories? If not I will do it. That way they can tell us how long it will take to get a fix.

Aug 12 '22 13:08 come-maiz

@jclapis how annoying is this for you? Do you have any workaround you can apply while the clients fix it?

We could temporarily patch it next week, as @Ruteri said, but quick patches are ugly, as @Ruteri said :)

Have you reported it in the clients repositories? If not I will do it. That way they can tell us how long it will take to get a fix.

I can remove the exited validator keys manually, no issue there.

If the issue is that the spec clearly states you shouldn't do this and they are doing it, then that's a client bug. I wouldn't expect you to put up a temporary patch to resolve it. I'm in touch with the dev teams so I can make sure they are aware of this and will resolve it. I'll reference this issue in it.

Thanks guys!

Aug 12 '22 17:08 jclapis

thank you for the reporting the issues <3

Aug 12 '22 17:08 come-maiz

No problem, I'll try it with the other clients I have over the weekend and check if any others report exited validators as well.

Aug 12 '22 17:08 jclapis

Ok, did some tests. This issue is present in Nimbus v22.7.0, Teku v22.8.0, and Lighthouse v2.5.1. Prysm v2.1.4 is not affected by it and seems to behave properly.

Aug 12 '22 18:08 jclapis

I think this is likely something that should be changed on the builder end, unless it's an actual problem to potentially have inactive validators in the list. There are a number of statuses that fall under pending - pending_initialized, pending_queued, and a number of statuses that fall under active- active_ongoing, active_exiting, active_slashed. The easiest option for all involved is likely to have builder not have to throw errors if 1 of N validators were not active, active_slashed, or a number of other potential issues, and can process all of the active ones for its list - there must already be logic, as its likely not doing a lot for pending_queued validators now, and active_slashed would be another status where there's not a lot to do, but is handled... From the VC side, it knows about keys potentially before they're known about by the BN, so there's a level of tolerance between BN and VC regarding the keys being passed for various things (such as validator registration). The VC doesn't really have the full beacon state, so while it can compute a fairly accurate state, it may not know the accurate state of the validator key at a given point in time, and so it's a messy thing for it to first query the state of all of the keys it owns. The BN is in a similar position for calculating duties that a VC considers active, sometimes it will get validators that aren't due to perform duties(unknown, exited, slashed etc) but they're gracefully handled and do not result in a failure of the request.

Aug 15 '22 01:08 rolfyone

Teku has fixed the issue: https://github.com/ConsenSys/teku/pull/6100

Aug 23 '22 08:08 jclapis

Nimbus has fixed the issue: https://github.com/status-im/nimbus-eth2/issues/3961

Aug 23 '22 08:08 jclapis

@Ruteri we could allow exited validators by adding that status to the getAllValidators function in the beacon client on the relay.

flashbots/mev-boost-relay#54

@metachris Hi, I wonder if mev-boost can't solve this issue. Or is there a reason to fix it only in mev-boost-relay? If the operator knows pubkeys that doesn't exited validators, I think https://github.com/dsrvlabs/mev-boost/commit/d5881e9df84f425f52550d98d0f78ed72a992d3f can fix this issue.

Aug 25 '22 08:08 skonhwang

Heads up, I'm testing Lodestar now and they ran into this as well. I've let them know and posted an issue on their repo to track it.

Nov 23 '22 05:11 jclapis

Heads up, I'm testing Lodestar now and they ran into this as well. I've let them know and posted an issue on their repo to track it.

This is now completed @jclapis and included for Lodestar v1.3.0 release

Dec 17 '22 16:12 philknows

Hi, is this solved in the sense that all other validators will be processed fine? Meaning if someone has 3 validators of which one exited, but he didn't remove it from the VC client, the others will perform the same as if it was removed?

Apr 21 '23 15:04 diggggst