client_python icon indicating copy to clipboard operation
client_python copied to clipboard

Duplicated timeseries in CollectorRegistry with Multiprocess Gunicorn

Open nkov opened this issue 2 years ago • 2 comments

I know this is a subject that comes up somewhat frequently but for the love of me I can't figure out what I'm doing wrong.

  1. I have a service in Amazon ECS thats running a single task with multiple workers (actually the problem happens in my other service that just has one worker also).

  2. I've created the directory and set the PROMETHEUS_MULTIPROC_DIR in the Dockerfile:

RUN mkdir -p /tmp/prom-metrics
ENV PROMETHEUS_MULTIPROC_DIR /tmp/prom-metrics
  1. I'm using the sample code in the README to create the registry in the /metrics request and return it:
registry = CollectorRegistry()
if getenv('PROMETHEUS_MULTIPROC_DIR'):
  multiprocess.MultiProcessCollector(registry)
data = generate_latest(registry)
status = '200 OK'
response_headers = [
    ('Content-type', CONTENT_TYPE_LATEST),
    ('Content-Length', str(len(data))),
]
return Response(data, status, response_headers)
  1. I've created the gunicorn.conf.py file with the sample from the README and passed it into my gunicorn startup script via -c:
from prometheus_client import multiprocess

def child_exit(server, worker):
    multiprocess.mark_process_dead(worker.pid)

In my two services, gunicorn starts them as follows:

# app 1 with workers
gunicorn -c /app/utils/gunicorn.conf.py -b :5000 -t 3600 --keep-alive 60 --threads 8 --workers 3 app:app

# app 2 without workers
gunicorn -c /app/utils/gunicorn.conf.py -b :5000 -t 3600 --keep-alive 60 --threads 8 app:app

The service boots successfully and accepts some metrics which are definitely collected in multiprocess mode, seeing as the HELP line simply displays Multiprocess metric.

This works for a few calls but eventually I get the dreaded Duplicated timeseries in CollectorRegistry error and no additional metrics are populated.

What might I be doing wrong?

nkov avatar May 23 '22 23:05 nkov

Interesting. Do you happen to be registering the metrics with registry=registry when creating the metrics? I haven't seen this myself, if you or anyone else has a small code demo that reproduces the issue that would be great.

csmarchbanks avatar May 25 '22 16:05 csmarchbanks

No, I did try to create the registry in the global namespace and pass it into the metrics via registry=registry, but this resulted in duplicate metrics (one Multiprocess, and one regular).

I will attempt to create a minimally reproducible example.

nkov avatar May 25 '22 16:05 nkov