postgrest icon indicating copy to clipboard operation
postgrest copied to clipboard

Feature Request: metrics or instrumentation

Open sevagh opened this issue 5 years ago • 15 comments

It would be nice to have a /metrics endpoint that exposes Prometheus-style metrics.

Note that I might be opinionated for working mostly with the Prometheus ecosystem - perhaps a more general metrics library, with different exposition formats to choose from, could work? I also don't know which existing Haskell metrics libraries or Prometheus libraries exist or are good (but hackage shows there might be some).

I'm not sure what metrics are the best to expose. I bet PostgREST developers know more about essential PostgREST KPIs. But things like:

postgrest_query_execution_time (histogram)
postgrest_http_requests
postgrest_schema_reloads
...

sevagh avatar May 16 '20 18:05 sevagh

Needs more discussion, but It could be a good idea!

One thing I'd like to note for now is that we can't use a metrics endpoint because it would conflict with users routes(one example of using a metrics table here). So maybe we can come up with a prefix, like /pgrst/metrics or /internal/metrics.

steve-chavez avatar Jul 10 '20 14:07 steve-chavez

That sounds good. Given PostgREST + NGINX is the recommended/common deployment, one could easily expose the internal metrics prefix at their desired location.

sevagh avatar Jul 10 '20 15:07 sevagh

On https://github.com/PostgREST/postgrest/issues/1933, we were thinking of using a special header for this and avoid creating an extra route.

Seems Prometheus doesn't support adding headers for scraping though :disappointed: https://github.com/prometheus/prometheus/issues/1724

PostgREST + NGINX is the recommended/common deployment

But since Nginx would be present, then I guess it's not a problem because it can map the header to a url. Another option in the future might be https://github.com/PostgREST/postgrest/issues/1909 as well. So a special header would do.

Edit: ref https://github.com/qnikst/prometheus-haskell

steve-chavez avatar Sep 03 '21 18:09 steve-chavez

Hmm at least for liveness/health checks, we'd want to be hitting the postgrest instances directly, rather than going through nginx or a similar rewriting layer.

Even with an nginx setup, I'd imagine we generally have it load balancing between multiple postgrest instances transparently, so even for metrics you'd likely want to be hitting each instance directly, rather than going over nginx.

One alternative would be to use a secondary port to host endpoints for liveness/metrics etc. This would avoid creating a breaking change where we reserve e.g. /internal or a similar base path, and trivially allow users to not expose these endpoints externally.

darora avatar Sep 22 '21 01:09 darora

Can haskell run multiple web servers on different ports? That's how it's typically done in other languages so that you don't have route path conflicts + you typically don't expose the prometheus webserver port to the outside world.

rupurt avatar May 05 '22 07:05 rupurt

@rupurt Yeah, we already have that on latest https://postgrest.org/en/latest/configuration.html#admin-server-port

steve-chavez avatar May 05 '22 15:05 steve-chavez

@steve-chavez awesome. Would be great to have prometheus metrics in there :smile:

rupurt avatar May 05 '22 16:05 rupurt

Is it okay to use https://hackage.haskell.org/package/prometheus for this task?

uhbif19 avatar Aug 09 '22 19:08 uhbif19

@uhbif19 Yes, that one should do.

For posterity, https://github.com/PostgREST/postgrest/pull/2129 was closed but the pool metrics discussed there would still be useful.

steve-chavez avatar Aug 10 '22 21:08 steve-chavez

From https://github.com/PostgREST/postgrest/issues/2477

It would be great if we can get 1.. GC 2. Query response times 3. Requests queued, DB connection pool usage count etc metrics on the Admin Server port at maybe /met

For 2, I was actually referring to time taken for a request for round trip from postgrest to DB. It's helpful in scenarios when we are observing high latencies from postgrest but DB takes only few milli seconds. Such issues could be due to high load on postgrest pods, cpu throttling, connection pool crunch etc (in k8s world).

steve-chavez avatar Sep 16 '22 18:09 steve-chavez

Has this feature been released in any 10.x version?

bhupixb avatar Mar 27 '23 09:03 bhupixb

Nope, not yet implemented.

(Issue would be closed if so)

steve-chavez avatar Mar 27 '23 10:03 steve-chavez

These days seems we should be using OpenTelemetry instead of Prometheus. Maybe with:

  • https://github.com/iand675/hs-opentelemetry.
  • https://github.com/ethercrow/opentelemetry-haskell

Article about the differences: https://www.timescale.com/blog/prometheus-vs-opentelemetry-metrics-a-complete-guide/

steve-chavez avatar Jun 08 '23 19:06 steve-chavez

A metric for the connection pool max acquisition time would be helpful to prevent acquisition timeouts. While it gets higher it will reach the timeout.

Also OpenTelemetry Traces seem to correspond to Server Timing?

steve-chavez avatar Dec 13 '23 20:12 steve-chavez

@steve-chavez yep, they seem to be a perfect match.

develop7 avatar Dec 15 '23 15:12 develop7