Multiproc doesn't capture all metrics
Hey folks,
I have a similar setup to what is described here. The only difference is that I have nginx in front talking to gunicorn over sockets, but I don't think that matters as far as I can tell.
I wasn't able to use this code:
# Using multiprocess collector for registry
def make_metrics_app():
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
return make_asgi_app(registry=registry)
metrics_app = make_metrics_app()
app.mount("/metrics", metrics_app)
It results in a page that gets a 307, so I implemented this:
@router.get("/metrics", response_class=PlainTextResponse, include_in_schema=False)
async def get_metrics() -> str:
"""
Get prometheus metrics
"""
registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)
data = generate_latest(registry)
res = Response(content=data)
res.headers["Content-Type"] = CONTENT_TYPE_LATEST
res.headers["Content-Length"] = str(len(data))
return res
Apart from that my setup is pretty typical, I have some counters and a histogram:
AUTOMATIONS_TOTAL: Final = Gauge(
name=prefix_("automations_total"),
documentation="Number of automations run",
labelnames=["platform", "action", "kind", "status"],
multiprocess_mode="sum",
)
REQUEST_TIME_SECONDS: Final = Histogram(
name=prefix_("request_time_seconds"),
documentation="Time spent processing request",
labelnames=["method", "url_rule", "status_code"],
)
REQUESTS_IN_PROGRESS_TOTAL: Final = Gauge(
name=prefix_("requests_in_progress_total"),
documentation="Number of concurrent requests",
# See Metrics Tuning (Gauge)
# https://github.com/prometheus/client_python#multiprocess-mode-gunicorn
multiprocess_mode="sum",
)
My in progress counter always remains at zero, but I'm hoping that's because requests finish faster than I can see them. The histogram, however, never shows up - I see no record of it in my PROMETHEUS_MULTIPROC_DIR and it's never rendered in the output. Any ideas for what I could troubleshoot?
Some extra context: most all of these get called in async middleware. Here's a rather benign one:
async def add_request_id(
request: Request, call_next: Callable[[Request], Awaitable[Response]]
) -> Response:
if not is_prometheus_endpoint(request):
REQUESTS_IN_PROGRESS_TOTAL.inc()
request.state.request_id = correlation_id.get()
start_time = time.time()
response = await call_next(request)
if not is_prometheus_endpoint(request):
time_taken = time.time() - start_time
# we replace UUIDs with <uuid> so that we flatten the curve of cardinality with unique URLs across all urls
REQUEST_TIME_SECONDS.labels(
method=request.method,
url_rule=replace_uuids(request.url.path),
status_code=response.status_code,
).observe(time_taken)
REQUESTS_IN_PROGRESS_TOTAL.dec()
return response
Hmm, I am not seeing anything obvious, does REQUESTS_IN_PROGRESS show up in the multiproc dir?
I have similar issues, with some histogram metrics specifically not showing up, only in certain contexts ( threaded celery workers e.g. ). Working on having reproducible code.
You are probably not setting PROMETHEUS_MULTIPROC_DIR before importing prometheus_client. Note that even a transitive import of prometheus_client from some downstream dependency can trigger this issue.
Does adding a labelnames argument (with a dummy label name) in REQUESTS_IN_PROGRESS_TOTAL solves the issue ?
I had similar troubles, and this solved it for me (see https://github.com/prometheus/client_python/issues/1123)