nimbus-eth2 icon indicating copy to clipboard operation
nimbus-eth2 copied to clipboard

keymanager API very slow while Nimbus is syncing

Open yorickdowne opened this issue 2 years ago • 1 comments

Describe the bug

When Nimbus is syncing, the keymanager remote key function - POST to eth/v1/remotekeys - is extremely slow, and interferes with syncing.

The regular keys function was not tested but might behave the same.

To Reproduce

  • Sync Nimbus - for easier reproduction without checkpoint sync, or with checkpoint sync and import while it's not caught up yet
  • Import (multiple) keys, e.g. with ./ethd keys import

Additional context

Sync may show missed heartbeat messages while keymanager API is in use

eth-docker-devel-consensus-1  | INF 2023-08-13 09:22:31.189+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=26s279ms985us157ns hinterval=700ms
eth-docker-devel-consensus-1  | INF 2023-08-13 09:22:31.189+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat="Gossipsub scoring" delay=13s818ms51us270ns hinterval=12s

On the key import side the remote key function takes 10 seconds or more to complete, or may even time out with a 60s timeout set in curl

yorickdowne avatar Aug 13 '23 09:08 yorickdowne

Encountering the same issue - this seems like it interferes with all functions of Nimbus (including preventing effective peering during the sync), and makes Nimbus perform extremely poorly with Rocketpool.

It will eventually manage a sync, but it initially gets itself stuck at just 1 peer for hours with the logs showing a continual loop of:

rocketpool_eth2  | INF 2023-11-09 09:15:53.979+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=13s356ms102us506ns hinterval=700ms
rocketpool_eth2  | INF 2023-11-09 09:15:53.988+00:00 Slot end                                   topics="beacnde" slot=7724776 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=none head=28ab91d3:7724672
rocketpool_eth2  | INF 2023-11-09 09:15:53.991+00:00 Slot start                                 topics="beacnde" slot=7724777 epoch=241399 sync="--h--m (0.00%) 0.0000slots/s (wPwwwwwwwD:7724671)" peers=2 head=28ab91d3:7724672 finalized=241394:17613184 delay=6s991ms869us447ns
rocketpool_eth2  | INF 2023-11-09 09:16:17.531+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=22s151ms558us224ns hinterval=700ms
rocketpool_eth2  | INF 2023-11-09 09:16:17.543+00:00 Slot end                                   topics="beacnde" slot=7724777 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=none head=28ab91d3:7724672
rocketpool_eth2  | NOT 2023-11-09 09:16:17.560+00:00 Missed expected slot start, catching up    topics="beacnde" delay=18s560ms56us427ns curSlot=7724777 nextSlot=7724777
rocketpool_eth2  | INF 2023-11-09 09:16:17.560+00:00 Slot start                                 topics="beacnde" slot=7724779 epoch=241399 sync="--h--m (0.00%) 0.0000slots/s (wPwwwwwwww:7724671)" peers=1 head=28ab91d3:7724672 finalized=241394:17613184 delay=18s560ms56us427ns
rocketpool_eth2  | INF 2023-11-09 09:16:41.799+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=22s868ms211us290ns hinterval=700ms
rocketpool_eth2  | INF 2023-11-09 09:16:41.811+00:00 Slot end                                   topics="beacnde" slot=7724779 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=none head=28ab91d3:7724672
rocketpool_eth2  | NOT 2023-11-09 09:16:41.819+00:00 Missed expected slot start, catching up    topics="beacnde" delay=18s819ms463us597ns curSlot=7724779 nextSlot=7724779

errantelectron avatar Nov 09 '23 10:11 errantelectron