besu icon indicating copy to clipboard operation
besu copied to clipboard

Implement liveness check

Open daniellehrner opened this issue 3 years ago • 2 comments

Besu offers a liveness RPC endpoint. But the endpoint does not do any check and just returns that is is UP no matter what: source code of the check.

We need to define what a proper liveness check looks like and implement it.

daniellehrner avatar Jun 23 '22 10:06 daniellehrner

That's typically all a liveness endpoint does. Something context dependent such as reporting if the node is in sync or not is usually a separate health check endpoint. Both are useful in different situations (ie you want a docker container to consider itself started once the liveness check passes - it would time out long before the node finishes syncing and the health check returns OK).

ajsutton avatar Jun 23 '22 10:06 ajsutton

I think there's potential for some ordering issues here, the JsonRpcHttpService starts before ethereum main loop starts. So you could be responding with "UP" before besu is fully up, but we're talking nanoseconds. The other possibility is that something fails after the JsonRpcHttpService starts, meaning that you could erroneously report as being up when you're not.

I think the main scenario that would be helpful is around graceful shutdowns. At the moment when you stop besu you may still be serving http requests. Ideally you'd want to indicate that you're shutting down on the LivenessCheck, this would allow k8s/whatever to remove the instance from rotation and direct requests elsewhere, then you could gracefully shutdown besu as normal (let me know if you want me to elaborate more).

antonydenyer avatar Jul 15 '22 14:07 antonydenyer

We discussed this a while ago among the developers and the reason why we always return UPis that there is no metric to tell if a Besu node needs a restart or not. Every issue could potentially just be temporary or an external factor, like bad peers or problems with the CL client. A restart would not fix that.

We have updated the documentation of the endpoint to make this behavior clear: https://besu.hyperledger.org/en/stable/public-networks/how-to/use-besu-api/json-rpc/?h=liveness#liveness

daniellehrner avatar Oct 25 '22 19:10 daniellehrner