prom-client icon indicating copy to clipboard operation
prom-client copied to clipboard

AggregatorRegistry Metrics Error - Operation Timed out

Open austonpramodh opened this issue 2 years ago • 5 comments

I am trying to integrate prom client into a cluster-enabled project. Unfortunately, I don't get any response from the Aggregator endpoint. I do get responses from the "/metrics" from the worker nodes, but when I hit the "/cluster_metrics" I get an error saying "Operation Timed out".

I would like to understand how the aggregator gets the metrics from the works, This might help me understand if I am doing and mistakes.

Master Server

image

Worker Server - Already runs an HTTP server, therefore created a handler.

image

Error

Screen Shot 2022-04-21 at 8 13 22 PM

austonpramodh avatar Apr 22 '22 00:04 austonpramodh

I found the solution for this, So looking at the ClusterMtrics file, It uses even emitter to collect the metrics from the workers. Therefore the AggregatorRegistry class needs to be instantiated on workers as well which I wasn't doing, I was just instantiating it in the master process.

image

Thanks. Closing this issue.

austonpramodh avatar Apr 25 '22 08:04 austonpramodh

This was actually a mistake in a recent release, see #464. I haven't had any time to work on prom-client to address it, but you found the workaround.

zbjornson avatar Apr 25 '22 22:04 zbjornson

We ran into the same issue as a regression, where cluster metrics stopped working after an upgrade from 13.1.0 -> 14.0.1.

Perhaps this issue should be re-opened, as #464 has not been merged, and there isn't yet any released version that fixes this issue?

SpComb avatar Aug 23 '22 07:08 SpComb

Since so much time has passed and since a few releases have been made since the regression, I'm going to try to find a non-breaking fix that allows either usage pattern.

zbjornson avatar Aug 25 '22 21:08 zbjornson

I hoped that the latest major version (15) update would address that. It's not. Feel like I'm stuck with 13.1.0 for another while. I thought using cluster was kind of the norm. If this is, that means that the breaking change introduced in 13.2.0 affects a lot of people and may deserve a bit of attention :shrug:. I believe the fix lives in this PR: https://github.com/siimon/prom-client/pull/464

morgaan avatar Feb 06 '24 15:02 morgaan