open-forms
open-forms copied to clipboard
Als gemeente wil ik metrics van alle containers ontvangen via open telemetry
Reference: PF-126
Thema / Theme
Other
Omschrijving / Description
Ref. Utrecht 182
Om goed inzicht te krijgen de beschikbaarheid en snelheid van services moeten deze metrics aanbieden middels een exporter. Prometheus is een optie, maar mooier is OTLP exporter. (open telemetry, daar kan sentry ook mee omgaan)
zouden jullie dat kunnen toevoegen voor Open Forms om mee te beginnen en Nginx, Redis, RabbitMQ.
Hieronder linkjes ter inspiratie. https://opentelemetry.io/docs/languages/python/getting-started/ https://prometheus.io/docs/instrumenting/exporters/
Added value / Toegevoegde waarde
No response
Aanvullende opmerkingen / Additional context
No response
Refinement: This needs to be investigated to make this work for Open Forms, but we can make 2 tasks: 1) the service container observability and 2) the application
For (1) we know Utrecht already uses infra-stuff for this so also check if we need to do this.
Refinement met Utrecht, begin bij de basics: https://wikitech.wikimedia.org/wiki/SLO/template_instructions#Organizational
Pauline wil graag een endpoint waar metrics op worden gepublished, hoeveel 500 errors per X tijd, en latency.
IMO those kind of things are best solved via the service mesh solution in kubernetes, for example with Istio: https://istio.io/latest/docs/ops/integrations/prometheus/
@sergei-maertens you can utilise Istio but your application still at least has to expose the metrics (or push them). The application also decides which metrics are useful.
@sdegroot We're tackling this topic as part of #4966
We do need to get approval from Utrecht to start working on this. Estimate is roughly a few weeks.
@joeribekker ok. I assume that estimation is the effort required to add this in all ZGW services you've built and not just open-forms?
That would be a wrong assumption - We need to create the proper setup (which we now don't have anywhere) and test out relevant things to report. But, once we do this for Open Forms, I assume the other components can follow more swiftly.
Ah, I see Django does not support OTEL out of the box like Spring Boot. However, it seems it does have the option for Prometheus (automated metric collection). Something to be considered for Pauline
Since my original comment I've explored options and features, and while the infra can tackle the "boring" metrics such as request/response time and status codes, there is particular interest on our side to push application-specific metrics that go beyond simple HTTP requests.
I shall first and foremost try out the various available libraries in the context of Open Forms, once I'm happy with the situation/findings I can document them for the other teams and products maintained by Maykin so that they can also more easily expose Prometheus metrics.
Hi kan iemand het label waiting for approval toevoegen?