prometheus_exporter icon indicating copy to clipboard operation
prometheus_exporter copied to clipboard

Add health check endpoint for the collector

Open kubibektas opened this issue 5 years ago • 5 comments

hi,

we are using metrics collector as a centralized Kubernetes pod that receives metrics from all the application pods. As we have more metrics, the collector pod (metrics server) stops functioning properly and we get ruby_collector_working 0. We noticed the pod was getting CPU throttled and increased the resources for it but would it be possible to add a health check endpoint so that Kubernetes would detect it automatically and restart the pod through a liveness probe?

I saw there was a closed issue for the same feature (https://github.com/discourse/prometheus_exporter/issues/69) But wanted to raise it again as it seems to be a useful functionality.

Thanks you!

kubibektas avatar Dec 03 '20 14:12 kubibektas

What collectors are you running?

On Fri, 4 Dec 2020 at 1:26 am, kubibektas [email protected] wrote:

hi,

we are using metrics collector as a centralized Kubernetes pod that receives metrics from all the application pods. As we have more metrics, the collector pod (metrics server) stops functioning properly and we get ruby_collector_working 0. We noticed the pod was getting CPU throttled and increased the resources for it but would it be possible to add a health check endpoint so that Kubernetes would detect it automatically and restart the pod through a liveness probe?

I saw there was a closed issue for the same feature (#69 https://github.com/discourse/prometheus_exporter/issues/69) But wanted to raise it again as it seems to be a useful functionality.

Thanks you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/discourse/prometheus_exporter/issues/145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABIXIMT3PWO445UTQGS43SS6N2LANCNFSM4UMBTEGQ .

SamSaffron avatar Dec 03 '20 20:12 SamSaffron

Hi Sam, thanks for the response.

We are just running server as bin/prometheus_exporter and have sidekiq instrumentation on client pods. But our main use case is for reporting our custom metrics related to our application (like number of orders etc). Our problem is that, we are reporting too many metrics and running the server as a single pod. At some point the server gets throttled due to the high number of metrics. In such cases we just want to restart the server and continue reporting metrics. It's not possible to do this automatically right now since we don't have a liveness probe to be used by Kubernetes.

kubibektas avatar Dec 07 '20 16:12 kubibektas

We would also like to have this feature 👍

h0jeZvgoxFepBQ2C avatar Jan 24 '21 18:01 h0jeZvgoxFepBQ2C

I am open to have a PR that adds a trivial health check at so /status it can return an OK status 200 page.

SamSaffron avatar Jan 25 '21 23:01 SamSaffron

I am open to have a PR that adds a trivial health check at so /status it can return an OK status 200 page.

Fixed in https://github.com/discourse/prometheus_exporter/commit/27a768932b81bb6308be761468dbb16c6c55ab0c PR: https://github.com/discourse/prometheus_exporter/pull/226

n-rodriguez avatar Oct 18 '22 20:10 n-rodriguez