invidious icon indicating copy to clipboard operation
invidious copied to clipboard

Add prometheus metrics at /api/v1/metrics

Open Wint3rmute opened this issue 2 years ago • 8 comments

Hello,

Please note that this MR is purely a suggestion for a feature. If you don't think that such a feature is needed, feel free to close it :)


This MR adds a prometheus-compatible /api/v1/metrics endpoint, which serves the subset of metrics (only numeric ones) contained in the /api/v1/stats, but in Prometheus-compatible format. The endpoint will only work if statistics_enabled: true in the configuration.

Motivation

Having a prometheus-compatible endpoint could make it easier for instance maintainers to monitor their instance. I can only speak for myself, but I've found Prometheus to be quite useful both for monitoring and for setting alerts when one of my services goes down. There's not much useful information contained in the metrics added by this MR, however I can imagine that more useful metrics could be extracted from invidious. Unfortunately, I'm not familiar with the project's code at the moment, so I'm trying to take small steps here :)

Possible paths for further development

  1. Extracting more useful metrics. If you know any places in the code that could emit metrics that would be useful for instance maintaners, please point me there :smile:
  2. Separate configuration option for enabling the metrics, instead of relying on statistics_enabled
  3. Serving the metrics on a separate port and address, so it can be isolated to a network only accessible for the instance maintainer.

Wint3rmute avatar Jan 18 '23 00:01 Wint3rmute

Not a bad idea and without any external dependency, so it's great.

unixfox avatar Jan 20 '23 20:01 unixfox

  1. Extracting more useful metrics. If you know any places in the code that could emit metrics that would be useful for instance maintaners, please point me there smile

I've had the idea of more metrics in mind for quite a while. the main problem is that almost no place in the code emits them, and the current storage (postgres DB) is not made for that purpose.

I think that once I'm done with #3628, we'll be able to add various metrics.

A non-exhaustive list of (imo) useful metrics:

  • amount of queries per major endpoint (channel, watch page, videoplayback, images proxy, static files)
  • amount of data (MiB) returned per major endpoints
  • average response latency (per 1/6/12/24h)
  • number of DB queries
  1. Separate configuration option for enabling the metrics, instead of relying on statistics_enabled

I'm not sure about that one. Statistics are metrics.

  1. Serving the metrics on a separate port and address, so it can be isolated to a network only accessible for the instance maintainer.

In my opinion, this should be handled in the reverse proxy config. We need to inform the maintainers across the documentation, but I don't think we should add all that complexity.

SamantazFox avatar Feb 19 '23 15:02 SamantazFox

Seems like a nice feature, just make sure it doesn't compromise the privacy of users.

alx-alexpark avatar Mar 12 '23 22:03 alx-alexpark

I've learned more about the Kemal framework and rewrote the metrics collection, trying to fullfill the ideas described by @SamantazFox:

  • amount of queries per major endpoint (channel, watch page, videoplayback, images proxy, static files)
  • average response latency (per 1/6/12/24h)

*response latency per X hours can be calculated using a monitoring system such as Prometheus, imo such aggregation should not be performed by Invidious. Metrics that this MR provides return the sum of seconds spent on handling a particular route, which can be used to calculate the latency per X hours. Afaik this is also what Prometheus suggests to do in this scenario.

Here's a sample of the new metrics, running on a local instance of Invidious:

http_requests_total{method="/api/v1/metrics" route="GET" response_code="200"} 66
http_requests_total{method="/" route="GET" response_code="302"} 2
http_requests_total{method="/feed/popular" route="GET" response_code="200"} 2
http_requests_total{method="/search" route="GET" response_code="200"} 2
http_requests_total{method="/vi/:id/:name" route="GET" response_code="200"} 43
http_requests_total{method="/watch" route="GET" response_code="200"} 3
http_requests_total{method="/ggpht/*" route="GET" response_code="200"} 26
http_requests_total{method="/latest_version" route="GET" response_code="302"} 6
http_requests_total{method="/api/v1/storyboards/:id" route="GET" response_code="200"} 3
http_requests_total{method="/api/v1/comments/:id" route="GET" response_code="200"} 3
http_request_duration_seconds_sum{method="/api/v1/metrics" route="GET" response_code="200"} 0.05781634
http_request_duration_seconds_sum{method="/" route="GET" response_code="302"} 0.002360641
http_request_duration_seconds_sum{method="/feed/popular" route="GET" response_code="200"} 0.001779746
http_request_duration_seconds_sum{method="/search" route="GET" response_code="200"} 1.5741987
http_request_duration_seconds_sum{method="/vi/:id/:name" route="GET" response_code="200"} 12.719181
http_request_duration_seconds_sum{method="/watch" route="GET" response_code="200"} 2.0207345
http_request_duration_seconds_sum{method="/ggpht/*" route="GET" response_code="200"} 5.050008
http_request_duration_seconds_sum{method="/latest_version" route="GET" response_code="302"} 0.12236025
http_request_duration_seconds_sum{method="/api/v1/storyboards/:id" route="GET" response_code="200"} 0.02403159
http_request_duration_seconds_sum{method="/api/v1/comments/:id" route="GET" response_code="200"} 1.0594385

@alx-alexpark as you can see, there are no metrics which would compromise the privacy :)

Implementation notes

  1. The names of the metrics are inspired by the Prometheus FastAPI Instrumentator.
  2. The middleware for Kemal is taken from Crystal prometheus exporter project. I wasn't able to reuse more of this project's code, as its approach to the metrics is straight out wrong (instead of providing a http endpoint for Prometheus, it pushes them out through a raw TCP socket)

Problems encountered

For some reason, the metrics collection handler is not called when the /videoplayback/ endpoint is being handled. I suspect that this is due to the before_all configuration, unfortunately I'm not very familiar with neither Kemal nor Invidious, so I'm having trouble debugging this.

Other notes

Imo adding database metrics to this PR would make it too large, so I'd prefer to split it into another PR :)

Wint3rmute avatar Mar 19 '23 16:03 Wint3rmute

Oh, sorry, I didn't notice that you update your code! That's looking great :D I'll review that as soon as possible!

SamantazFox avatar May 07 '23 13:05 SamantazFox

Sorry, it took a while to review the new changes!

SamantazFox avatar Nov 26 '23 19:11 SamantazFox

I've tried this PR on my personal private instance (running this PR rebased on latest master commit ( iv-org:master 90e94d4e6cc126a8b7a091d12d7a5556bfe369d5 ) as a docker ).

It works fine, thank you for the PR!

Extra infos if needed:

To add it on a Prometheus server:

  - job_name: invidious
    static_configs:
      - targets: ['localhost:3088']
    metrics_path: /api/v1/metrics

Here is a prototype dashboard you may use in Grafana for testing: grafana_invidious_panel.json

Screenshot of the grafana panel

(sorry units are wrong on the HTTP rates, feel free to correct!)

Soblow avatar Aug 08 '24 18:08 Soblow