forest
forest copied to clipboard
Implement `/readyz` and `/livez` endpoints
Issue summary
To make operations easier with Forest, there should be an endpoint that clearly defines whether Forest is healthy, alive and ready to serve RPC requests.
In the Kubernetes world (subjectively, a good source of best practices when it comes to managing services), there is a notion of liveness and readiness probes. Alternatively, there is a health probe, which was deprecated, but we could still have it.
The goal is for the orchestrator, be it Kubernetes, Docker Compose or any custom contraption to know whether the service is up or down (and should be restarted) and ready to serve requests (for load-balancing purposes).
Task summary
- [ ] Implement
/livezendpoint - it should return 200 if the node is live. The endpoint should accept?verboseargument which should list the checks that were mode. Sample checks:- Prometheus server is up,
- connected to at least 1 other peer,
- other tasks in the main loop have started.
- [ ] Implement
/readyzendpoint - it should return 200 if the node is ready to serve RPC requests (note that the checks may be different for an offline node). The endpoint should accept?verboseargument, which should list the checks that were mode. Sample checks:- RPC server is up and responding,
- (online node-only) - the node is not too far behind the expected time, e.g.,
genesis time + block_interval * epoch <= current_timestamp + epsilon, where epsilon is some arbitrarily chosen grace period, for example, six blocks period. This may need to be disabled for devnets. - (online node-only) is in follow-mode
- [ ] Implement
forest-cli node info node-ready|node-livesubcommands to wrap the above. - [ ] Add these checks to some of the existing tests.
Feel free to come up with more checks. They don't have to be necessarily implemented (at least, not all of them), these may come as follow-up tasks.
Other information and links https://kubernetes.io/docs/reference/using-api/health-checks/#api-endpoints-for-health