open-forms icon indicating copy to clipboard operation
open-forms copied to clipboard

Als gemeente wil ik metrics van alle containers ontvangen via open telemetry

Open joeribekker opened this issue 1 year ago • 12 comments

Reference: PF-126

Thema / Theme

Other

Omschrijving / Description

Ref. Utrecht 182

Om goed inzicht te krijgen de beschikbaarheid en snelheid van services moeten deze metrics aanbieden middels een exporter. Prometheus is een optie, maar mooier is OTLP exporter. (open telemetry, daar kan sentry ook mee omgaan)

zouden jullie dat kunnen toevoegen voor Open Forms om mee te beginnen en  Nginx, Redis, RabbitMQ.

Hieronder linkjes ter inspiratie.  https://opentelemetry.io/docs/languages/python/getting-started/  https://prometheus.io/docs/instrumenting/exporters/

Added value / Toegevoegde waarde

No response

Aanvullende opmerkingen / Additional context

No response

joeribekker avatar Mar 13 '24 08:03 joeribekker

Refinement: This needs to be investigated to make this work for Open Forms, but we can make 2 tasks: 1) the service container observability and 2) the application

For (1) we know Utrecht already uses infra-stuff for this so also check if we need to do this.

joeribekker avatar Mar 18 '24 10:03 joeribekker

Refinement met Utrecht, begin bij de basics: https://wikitech.wikimedia.org/wiki/SLO/template_instructions#Organizational

joeribekker avatar Jun 03 '24 12:06 joeribekker

Pauline wil graag een endpoint waar metrics op worden gepublished, hoeveel 500 errors per X tijd, en latency.

joeribekker avatar Sep 10 '24 14:09 joeribekker

IMO those kind of things are best solved via the service mesh solution in kubernetes, for example with Istio: https://istio.io/latest/docs/ops/integrations/prometheus/

sergei-maertens avatar Sep 12 '24 08:09 sergei-maertens

@sergei-maertens you can utilise Istio but your application still at least has to expose the metrics (or push them). The application also decides which metrics are useful.

sdegroot avatar Mar 03 '25 13:03 sdegroot

@sdegroot We're tackling this topic as part of #4966

We do need to get approval from Utrecht to start working on this. Estimate is roughly a few weeks.

joeribekker avatar Mar 03 '25 16:03 joeribekker

@joeribekker ok. I assume that estimation is the effort required to add this in all ZGW services you've built and not just open-forms?

sdegroot avatar Mar 03 '25 16:03 sdegroot

That would be a wrong assumption - We need to create the proper setup (which we now don't have anywhere) and test out relevant things to report. But, once we do this for Open Forms, I assume the other components can follow more swiftly.

joeribekker avatar Mar 03 '25 16:03 joeribekker

Ah, I see Django does not support OTEL out of the box like Spring Boot. However, it seems it does have the option for Prometheus (automated metric collection). Something to be considered for Pauline

sdegroot avatar Mar 03 '25 16:03 sdegroot

Since my original comment I've explored options and features, and while the infra can tackle the "boring" metrics such as request/response time and status codes, there is particular interest on our side to push application-specific metrics that go beyond simple HTTP requests.

I shall first and foremost try out the various available libraries in the context of Open Forms, once I'm happy with the situation/findings I can document them for the other teams and products maintained by Maykin so that they can also more easily expose Prometheus metrics.

sergei-maertens avatar Mar 04 '25 15:03 sergei-maertens

Hi kan iemand het label waiting for approval toevoegen?

PaulineUtrecht avatar Mar 06 '25 07:03 PaulineUtrecht