opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

Metrics with NodeJS cluster mode

Open tiagonapoli opened this issue 4 years ago • 6 comments

I'm trying to use the Prometheus exporter with cluster mode, however each worker tries to spawn a new server to export metrics, which results in error. I thought of creating the MeterProvider only on the master process, so the exporter would be initialized only once, but I think this wouldn't work, since the workers would need to use the meter provider created.

The prom-client, which Prometheus exporter is based on, mentions how to use with cluster mode: https://github.com/siimon/prom-client#usage-with-nodejss-cluster-module

Node.js's cluster module spawns multiple processes and hands off socket connections to those workers. Returning metrics from a worker's local registry will only reveal that individual worker's metrics, which is generally undesirable. To solve this, you can aggregate all of the workers' metrics in the master process. See example/cluster.js for an example.

How to setup @opentelemetry/metrics to use with cluster mode? Should I create a custom exporter on the workers to send the metrics to master which then would export to prometheus? I have just started learning how to use opentelemetry so I don't have any ideas :(

  • [X] This only affects the JavaScript OpenTelemetry library
  • [ ] This may affect other libraries, but I would like to get opinions here first

tiagonapoli avatar Jun 27 '20 16:06 tiagonapoli

Hi @tiagonapoli facing the same issue...did you a solution for this

vin-mad avatar Feb 09 '22 08:02 vin-mad

Has anyone solved this? Otel metrics lib is hitting stable soon and it would be great if it could support this as many production nodejs applications are deployed in cluster mode (eg with pm2).

pkarakal avatar Aug 27 '22 23:08 pkarakal

I believe this is common to Node.js use cases and prom-client also provides support for this.

legendecas avatar Sep 15 '22 09:09 legendecas

Unfortunately I haven't found a solution at the time, and has been some time I haven't been working with Node.js

tiagonapoli avatar Sep 15 '22 13:09 tiagonapoli

Hmm i'm not an expert in the cluster module although i have used it in the distant past. I wonder how much work it would be to support this or if it would just be a matter of properly documenting a setup process.

dyladan avatar Sep 15 '22 17:09 dyladan

I think we can provide a contrib package to support cluster metrics collection, rather than in the sdk-metrics. I'm planning to work on a POC to see what we can provide with that package.

legendecas avatar Sep 15 '22 17:09 legendecas

I've looked at the prom-client example for this, and the library exposes logic that can be used for taking the metrics returned from a cluster worker and manipulating it to do the aggregation. The good news/bad news is that a lot of the example uses functionality from the library.

It's not clear to me that there are similar facilities in opentelemetryjs. This codebase is not exactly built for understanding by trace debugging either.

I think you'd be stuck implementing your own metric parsing and coalescing code, which is why it would be good if someone with inside knowledge tried to do this instead of leaving it as an exercise to the user.

jdmarshall avatar Apr 13 '23 20:04 jdmarshall

@tiagonapoli any progress on solving this problem for your own project?

jdmarshall avatar Apr 13 '23 20:04 jdmarshall