lodestar icon indicating copy to clipboard operation
lodestar copied to clipboard

post-fulu: network thread bottle neck

Open twoeths opened this issue 1 month ago • 3 comments

Describe the bug

the gossip validation job time was computed against gossipsub seenTimestampSec and it's always <=250ms

Image

but elapsed time till received, where we compute against the start slot is terrible

Image

it means gossipsub send DataColumnSidecars very late into the slot, it's confirmed through network thread event loop too

Image

Expected behavior

network thread should receive/send DataColumnSidecars sooner

Steps to reproduce

No response

Additional context

No response

Operating system

Linux

Lodestar version or commit hash

v1.36.0

twoeths avatar Nov 22 '25 19:11 twoeths

this is a blocker of #8619

twoeths avatar Nov 22 '25 19:11 twoeths

network thread gc time also increased since fulu

Image

twoeths avatar Nov 22 '25 19:11 twoeths

network_thread_hoodi_sas_nov_22.cpuprofile.zip

gc is huge, 13.5%

Image

after the debugging session with @wemeetagain some ideas have come up:

  • use Buffer.allocUnsafe for snappyjs, increase poolSize to some numbers, 10MB was too big and terrible as tested
  • store a pool of Buffers at gossipsub side, use it for topics like beacon_attestation, data_column_sidecar
  • impove chacha20poly1305 to use different libs based on the size of message

twoeths avatar Nov 22 '25 22:11 twoeths