autoscaling Healthchecks and autorestarts for computes

Healthchecks and autorestarts for computes

Open olegbbtr opened this issue 1 year ago • 3 comments

Branched off from https://github.com/neondatabase/cloud/issues/14114

At this moment, we can only rely on k8s's signal for compute unavailability, specifically, container process monitoring.

We would like to have an end-to-end healthcheck, which would allow us to detect problems, such as:

We have a healthcheck mechanism, allowing us to detect compute issues within <30s, and taking appropriate actions, such as restarting.

We should have a piece of code inside vm which would respond to a healthcheck.

Sep 20 '24 14:09 olegbbtr

Not sure we will need that or if Kubernetes is good enough. Putting in the backlog for now.

Sep 23 '24 15:09 stradig

I wonder if we can add some generic health check mechanism as part of neondatabase/cloud#27103? cc @hlinnaka

Apr 17 '25 10:04 sharnoff

This issue was moved to Jira: LKB-2137

Jul 21 '25 12:07 zenithdb-bot-dev[bot]