pyctuator Add optional support for k8s liveness and rediness probes

See https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#production-ready-kubernetes-probes for how its done in spring-boot's actuator.

In high level, pyctuator should support telling k8s when an application/service is ready to serve requests (i.e. readiness probe) and if the application is alive (this is the liveness probe) - this is different from the health status returned by /pyctuator/health for two reasons:

it should return fast, so shouldn't include health checks of external resources
it should reflect the k8s lifecycle as described in actuator's documentation (and in k8s docs of course)

While actuator allow to configure additional checks to these probes, it is suggested that initially we'll provide default implementation that users can choose to use.

Aug 31 '20 09:08 michaelyaakoby

Since if the Liveness State of an application is broken, Kubernetes will try to solve that problem by restarting the application instance, the “liveness” probe should not depend on health checks for external systems. Otherwise, if an external system fails (e.g. a database, a Web API, an external cache) triggering the liveness to fail, Kubernetes might restart all application instances and create cascading failures.

Also, since Kubernetes will not route traffic to an instance of the application that it's "readiness state" is unready, checking external systems must be made carefully by the application developers.

The probes should return 200 if they the server is alive/ready and >=400 otherwise (or completely fail to repsonse).

For more details on these probes, see https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/.

Therefore I suggest:

Add two new endpoints under /pyctuator/health/probes', one for readinessand another forliveness`.
By default all they do is returning 200 OK.
Allow to register custom "readiness" checks which are functions that throw an exception to indicate not-ready.

WDYT?

Sep 21 '20 07:09 michaelyaakoby

there's another problem. when the registration with the admin server fails, pyctuator does not serve actuator endpoints at all. this behavior leads to cascading restarts since the readiness probe fails with connection refused.

Jun 29 '21 11:06 simoneggler

I see, Can you provide steps to reproduce? Thanks

Jun 29 '21 11:06 michaelyaakoby

point registration_url to an url that does not resolve (e.g. service in k8s cluster with no running pod)
create instance of Pyctuator (with flask)
flask only becomes ready after a timeout of more than two minutess (Failed registering with boot-admin, [Errno 110] Connection timed out)

expected behavoir: exception is thrown much earlier

Jun 29 '21 20:06 simoneggler

Thanks, will look into this.

Jun 29 '21 20:06 michaelyaakoby

Actually, assuming you run your flask application in a k8s pod too, I'm thinking if this might be a behavior of k8s. Can you try to curl to the same URL from the pod running the flask application? I wonder if you'll get an immediate answer.

Jun 29 '21 21:06 michaelyaakoby

Curl times out after two minutes...

Jun 29 '21 21:06 simoneggler

Yes, I just tested it myself. Not sure that in this case I can make pyctuator timeout earlier.

Jun 29 '21 21:06 michaelyaakoby

How about connecting in the background? The service may be healthy without admin registration....

Jun 29 '21 21:06 simoneggler

I agree that you don't want to block an application from starting because the monitoring system is down.

Pyctuator is using the builtin http.client.HTTPConnection so it doesnt force you to use any library - whatever will be the solution, we need to maintain this.

Looking at http.client.HTTPConnection, I see there's an optional timeout that we may use, see https://docs.python.org/3/library/http.client.html

I can make this a config parameter with a default that's much lower, maybe 10s.

Jun 29 '21 21:06 michaelyaakoby

That‘d be great, thanks.

Jun 29 '21 21:06 simoneggler

Oh, just noticed we are discussing this hanging within the issue asking k8s probes. Moving the discussion to #51

Jul 12 '21 19:07 michaelyaakoby

pyctuator pyctuator copied to clipboard

Add optional support for k8s liveness and rediness probes

pyctuator
pyctuator copied to clipboard