prometheus_exporter
prometheus_exporter copied to clipboard
Add health check endpoint for the collector
hi,
we are using metrics collector as a centralized Kubernetes pod that receives metrics from all the application pods. As we have more metrics, the collector pod (metrics server) stops functioning properly and we get ruby_collector_working 0. We noticed the pod was getting CPU throttled and increased the resources for it but would it be possible to add a health check endpoint so that Kubernetes would detect it automatically and restart the pod through a liveness probe?
I saw there was a closed issue for the same feature (https://github.com/discourse/prometheus_exporter/issues/69) But wanted to raise it again as it seems to be a useful functionality.
Thanks you!
What collectors are you running?
On Fri, 4 Dec 2020 at 1:26 am, kubibektas [email protected] wrote:
hi,
we are using metrics collector as a centralized Kubernetes pod that receives metrics from all the application pods. As we have more metrics, the collector pod (metrics server) stops functioning properly and we get ruby_collector_working 0. We noticed the pod was getting CPU throttled and increased the resources for it but would it be possible to add a health check endpoint so that Kubernetes would detect it automatically and restart the pod through a liveness probe?
I saw there was a closed issue for the same feature (#69 https://github.com/discourse/prometheus_exporter/issues/69) But wanted to raise it again as it seems to be a useful functionality.
Thanks you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/discourse/prometheus_exporter/issues/145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABIXIMT3PWO445UTQGS43SS6N2LANCNFSM4UMBTEGQ .
Hi Sam, thanks for the response.
We are just running server as bin/prometheus_exporter and have sidekiq instrumentation on client pods. But our main use case is for reporting our custom metrics related to our application (like number of orders etc). Our problem is that, we are reporting too many metrics and running the server as a single pod. At some point the server gets throttled due to the high number of metrics. In such cases we just want to restart the server and continue reporting metrics. It's not possible to do this automatically right now since we don't have a liveness probe to be used by Kubernetes.
We would also like to have this feature 👍
I am open to have a PR that adds a trivial health check at so /status it can return an OK status 200 page.
I am open to have a PR that adds a trivial health check at so
/statusit can return an OK status 200 page.
Fixed in https://github.com/discourse/prometheus_exporter/commit/27a768932b81bb6308be761468dbb16c6c55ab0c PR: https://github.com/discourse/prometheus_exporter/pull/226