zksync-era
zksync-era copied to clipboard
feat(healthcheck): Various healthcheck improvements
What ❔
- Adds
HeathStatus::ShuttingDownset immediately after a component receives a termination signal. Makes the/healthendpoint conforming to K8s readiness probe expectations. - Makes slow / hard time limits for health checks configurable and decreases their values by default.
- Adds metric for slow, timed out and dropped health checks.
Why ❔
Improves healthcheck observability.
Checklist
- [x] PR title corresponds to the body of PR (we generate changelog entries from PRs).
- [x] Tests for the changes have been added / updated.
- [x] Documentation comments have been added / updated.
- [x] Code has been formatted via
zk fmtandzk lint. - [x] Spellcheck has been run via
zk spellcheck. - [x] Linkcheck has been run via
zk linkcheck.
Just in case: I've checked that if /heath is requested with a small client timeout (e.g., using curl -m ..) so that it doesn't complete in time, then axum drops the handling future together with pending futures it depends on (in particular, CheckHealth::check_health() implementations). So a drop guard added in this PR will actually be triggered in this case.