teleport
teleport copied to clipboard
`/readyz` endpoint returns 200 OK when not all enabled services are running
What would you like Teleport to do?
Introduce a new health-check endpoint (or modify the existing /readyz
endpoint) that provides a 200 OK response only if all enabled services in the configuration are up and running without errors.
What problem does this solve?
Currently, the /readyz
endpoint returns a 200 OK status as soon as the instance successfully heartbeats with the cluster.
This means that if one or more of the configured Teleport services (e.g., app_service) is not yet ready after, or never starts up properly, /readyz
still returns a 200 OK. This is true as long as it was able to do a heartbeat of any kind.
A repeatable method to force a successful heartbeat, but have a broken service is to enable both the ssh_service
and the app_service
, and then try to join the cluster with a token that is good for the app
role only. The app service starts up, the instance heartbeats, but the ssh_service
never becomes healthy, all while /readyz
returns 200 OK.
If a workaround exists, please include it.
I looked over the /metrics
endpoint, hoping that health/status info for each service might be there, but it wasn't. There doesn't appear to be a good way to determine the readiness based on the status of the individual Teleport services.
/healthz
will always return a 200 if the process is running. If it is determined that the current behavior of readyz
should not be altered, an additional endpoint with the desired behavior would be great.