charon icon indicating copy to clipboard operation
charon copied to clipboard

Check and Improve Beacon Node Health Status Logic

Open boulder225 opened this issue 10 months ago • 0 comments

🎯 Problem to be solved

The current implementation of the Beacon Node Health Status logic in Charon does not consider the number of connected peers. As a result, a Beacon Node with zero peers is still reported as having a "Health Status" of OK, which may not reflect the operational status of the node.

Resources

This cluster has a Health Status OK despite zero peers in BN (which is supposed to be a health status).

Per the docs for the /readyz endpoint (which is used for the Health Status gauge)

"Set to 1 if the node is operational and monitoring api /readyz endpoint is returning 200s. " "Else /readyz is returning 500s and this metric is either set to " "2 if the beacon node is down, or" "3 if the beacon node is syncing, or" "4 if quorum peers are not connected."

🛠️ Proposed solution

  • [ ] Implement a check for the minimum required number (at least 3) of connected peers
  • [ ] Update the Health Status logic to consider a Beacon Node with zero peers as unhealthy
  • [ ] Determine the appropriate Health Status value or error code for a Beacon Node with zero peers

boulder225 avatar Apr 03 '24 10:04 boulder225