hornet
hornet copied to clipboard
Mainnet nodes become unresponsive
In rare cases (e.g. 1 node every 1-3 months) it happens that Hornet locks internally somewhere.
Effects are:
- no scraping via Prometheus possible
- no API calls possible
- no graceful shutdown possible
- no syncing
- status line is always the same (in some cases we caught it with non-zero but constant values for in/new/out). For instance:
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 4287127/4287128, MPS (in/new/out): 00065/00039/00563, Tips (non-/semi-lazy): 18/0
Servers look normal in monitoring. Nothing suspicious about CPU, RAM, storage, ...
Here are stack traces of two different nodes: goroutine.debug2.txt goroutine.debug2.txt
More information is available.