lodestar
lodestar copied to clipboard
Network thread performance issue due to async randomness
Describe the bug
This is a review of metrics monitored on our test mainnet node of #7761, it's very likely we'll merge that PR since the issue only happens on a test mainnet node subscribing on all subnets and it improved the mainnet thread a lot, so I make this issue for later reference
- in general, that PR improves the main thread a lot that cause more pressure on the network thread
- on the last 8 days, scavenge gc keeps going up
- the event loop lag keeps increasing
- due to that the request I/O time increased, especially for ping, status, metadata
- the node has so many peers so it has to disconnect a lot of them
- peer manager heart beat also increased
- on the main thread, it improved a lot
The issue does not happen on other nodes
Expected behavior
Event loop lag on the network thread is the same to before
Steps to reproduce
No response
Additional context
No response
Operating system
Linux
Lodestar version or commit hash
mkeil/aggregate-with-randomness-async-again