nimbus-eth2 poor validator performance when bad or viable nodes present

when you have multiple beacon nodes configured for the validator, rated raver performance noticeably drops if one of the nodes is 'bad' (offline) or 'viable' (syncing)

To Reproduce Steps to reproduce the behavior:

Platform details (OS, architecture): x86_64 Ubuntu w docker
Branch/commit used: statusim/nimbus-validator-client:multiarch-v25.3.1
Commands being executed: '...'
Relevant log lines: '...'

Apr 08 '25 19:04 b0a7

Could you please elaborate more on this issue? Is it possible to get some metrics and/or logs before and after?

Apr 14 '25 07:04 cheatfate

I believe the big dip shown below was from 1 of 3 nodes connected to the validator going offline.

I've also seen similar issues when resyncing 1 of 3 connected nodes. In that case removing the resyncing node and going with 2 nodes until the resync completed restored performance

Apr 14 '25 22:04 b0a7

I'm seeing a similar issue with the nimbus VC (25.5.0) connected to 2 BNs, Nimbus (25.5.0) and the new teku BN (also 25.5.0 :sweat_smile: )

They changed how the teku BN communicates with their new version, and it killed performance of my VC because it was constantly trying to connect to the teku BN, even though the nimbus BN was working fine.

Jun 02 '25 00:06 saint-rat

I'm seeing something really similar here, though I'm not entirely confident it's the exact same root cause. My Nimbus setup involves Nimbus running on both the main and fallback nodes. When I reboot the fallback node, my main node misses attestations, even though its beacon is good.

Jun 02 '25 07:06 suna4

I'm seeing a similar issue with the nimbus VC (25.5.0) connected to 2 BNs, Nimbus (25.5.0) and the new teku BN (also 25.5.0 😅 )

They changed how the teku BN communicates with their new version, and it killed performance of my VC because it was constantly trying to connect to the teku BN, even though the nimbus BN was working fine.

https://github.com/status-im/nimbus-eth2/releases/tag/v25.7.0 restores Teku BN compatibility.

Also, via https://github.com/status-im/nimbus-eth2/pull/7276 it might improve this in general.

Jul 14 '25 22:07 tersec

If one of two all-roles BNs has a high-latency connection to the VC, performance is worsened compared to using a single low latency BN.

This makes a "fallback" BN detrimental in some circumstances - such as a user with both a home based BN and a hosted BN elsewhere on the globe.

This is easy to verify in the VC logs and still occurs with 25.7.0

Steps to reproduce: -Configure the Nimbus VC with two BNs, one of which has high connection latency (e.g. >250ms) -Check the VC logs - both BNs will now be reported as having significant time skew associated with them (probably due to the Nimbus VC waiting for responses from the high latency BN before proceeding with duties)

Suggested change - implement an additional BN role that allows the user to specify "use this BN for publish roles only, unless no other BN is usable"

Jul 16 '25 01:07 errantelectron

I think the issue of bad performance with a 'viable' (syncing) node is fixed in 25.7.0

Jul 17 '25 01:07 b0a7

I think the issue of bad performance with a 'viable' (syncing) node is fixed in 25.7.0

I take that back

Jul 21 '25 01:07 b0a7

nimbus-eth2 nimbus-eth2 copied to clipboard

poor validator performance when bad or viable nodes present

nimbus-eth2
nimbus-eth2 copied to clipboard