hornet icon indicating copy to clipboard operation
hornet copied to clipboard

Mainnet nodes become unresponsive

Open shufps opened this issue 3 years ago • 0 comments

In rare cases (e.g. 1 node every 1-3 months) it happens that Hornet locks internally somewhere.

Effects are:

  • no scraping via Prometheus possible
  • no API calls possible
  • no graceful shutdown possible
  • no syncing
  • status line is always the same (in some cases we caught it with non-zero but constant values for in/new/out). For instance:
req(qu/pe/proc/lat): 00000/00000/00000/0000ms, reqQMs: 0, processor: 00000, CMI/LMI: 4287127/4287128, MPS (in/new/out): 00065/00039/00563, Tips (non-/semi-lazy): 18/0

Servers look normal in monitoring. Nothing suspicious about CPU, RAM, storage, ...

Here are stack traces of two different nodes: goroutine.debug2.txt goroutine.debug2.txt

More information is available.

shufps avatar Sep 07 '22 16:09 shufps