traefik
traefik copied to clipboard
Prometheus: More detailed metrics
Do you want to request a feature or report a bug?
Feature
What did you expect to see?
I like to have a more detailed metric with prometeus. For example:
- Requests per Route
- Requests per Backend
- Average Time per Backend
- Average Time per Route
Hi @danielh1989,
Thanks for your interest in Træfik projet.
For information, the current available metrics are :
- config_reloads_total
- traefik_config_reloads_failure_total
- traefik_config_last_reload_success
- traefik_config_last_reload_failure
- traefik_entrypoint_requests_total
- traefik_entrypoint_request_duration_seconds
- traefik_entrypoint_open_connections
- traefik_backend_requests_total
- traefik_backend_request_duration_seconds
- traefik_backend_open_connections
- traefik_backend_retries_total
- traefik_backend_server_up
What do you mean exactly when you tell "per route"? Prometheus doesn't like "unbound" label values and so putting an arbitrary path into it is not an option if you don't want to kill your Prometheus server.
as an example, i'd really like to know which API (based on a path) is slower than others when using the traefik_backend_requests_total metric. my services don't have an unbounded number of APIs so perhaps Traefik could only fill the label for backend requests that are not a 404 to avoid client spam.
Recently I moved from ingress-nginx controller to traefik, and only now I realised, that traefik unfortunately doesn't have any way to collect metrics by URL path. For example in ingress-nginx controller this behavior works by default: https://github.com/kubernetes/ingress-nginx/blob/96b6228a6b65a85e421b8a348a149e99181664d1/deploy/grafana/dashboards/request-handling-performance.json#L314
Traefik has a lot of pros like SSL certificates management and very comfortable routing, but observability of the system also is one of the most important thing, and now I can't understand which route of my API works slow....
Hope that the path label will be added to metrics in nearest future..
I found the metric labels were more usable if I used an intermediate TraefikService in the path to my actual k8s service.
Recently I moved from ingress-nginx controller to traefik, and only now I realised, that traefik unfortunately doesn't have any way to collect metrics by URL path. For example in ingress-nginx controller this behavior works by default: https://github.com/kubernetes/ingress-nginx/blob/96b6228a6b65a85e421b8a348a149e99181664d1/deploy/grafana/dashboards/request-handling-performance.json#L314
Traefik has a lot of pros like SSL certificates management and very comfortable routing, but observability of the system also is one of the most important thing, and now I can't understand which route of my API works slow.... Hope that the
pathlabel will be added to metrics in nearest future..
yes, the observability for every path of URL(route) is important for me too, even for alert/notification of traffic accident.
Two years have passed, and I again tried to solve this problem, so, there is some good news.
This is how it's possible to collect path metrics with Traefik (spoiler: only requests_total, not histogram/latency).
- Create Middleware which provides the path as header
middleware.yaml:
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: x-replace-path-metrics-header
spec:
replacePathRegex:
regex: ^(.*)
replacement: $1
it does nothing with path, but as an artifact after applying this mw, we have an X-Replaced-Path header with our path(details).
- Apply the middleware for your IngressRoute
ingress.yaml:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: app
spec:
entryPoints:
- web
routes:
- match: Host(`app.co`)
kind: Rule
middlewares:
- name: x-replace-path-metrics-header
services:
- kind: Service
name: app
port: 80
- Group your paths to avoid cardinality issues (it actually should be on the first place, cos before manifests apply you need to install Traefik helm chart, but I keep it at the 3rd to make the idea clear)
values.yaml (values for the traefik helm chart)
# Ask the Prometheus use our X-Replaced-Path header as a label
additionalArguments:
# details here: https://doc.traefik.io/traefik/observability/metrics/prometheus/#headerlabels
- '--metrics.prometheus.headerlabels.path=X-Replaced-Path'
metrics:
prometheus:
# Some basic not critical settings for the Prometheus
addEntryPointsLabels: true
addRoutersLabels: true
addServicesLabels: true
serviceMonitor:
enabled: true
additionalLabels:
# release name of the Prometheus can be found here: kubectl -n <prometheus ns> get servicemonitors.monitoring.coreos.com kube-prometheus-stack-prometheus -o yaml | yq .metadata.labels.release
release: kube-prometheus-stack
# And finally the grouping, for example, you can squash URLs /assets/e2301aaa/bundle.js and /assets/e2301ccc/bundle.js into this one /assets/{hash}/bundle.js
# In my case I don't describe the metricRelabelings field in the values, instead I set this value during the helm chart installation and generate the relabling config in a "gateway" application which I interested in
# So in result my traefik installation command looks like:
# helm install traefik traefik/traefik -f values.yaml --set-json "metrics.prometheus.serviceMonitor.metricRelabelings=$(make -C ./../gateway generate-relabelings-config)"
metricRelabelings:
- sourceLabels: [ path ]
regex: ^/assets/[0-9a-z]+/(.*)$
targetLabel: path
replacement: /assets/{hash}/$1
action: replace
And almost all looks good, with this setup I can see the per-path(and actually per-host by adding --metrics.prometheus.headerlabels.host=X-Forwarded-Host into the additionalArguments) metrics of my requests.
There is only one problem, unfortunately, this setup works only for requests_total metrics, and doesn't work for the histogram metrics...
As a result, I can see the rate of request per-path, but I still can't see the latency per path..
Hopefully @leonlyu1996 did the greater job here - https://github.com/traefik/traefik/issues/10774 to add Histogram for the headerlabels metrics, hope the Traefik team will merge the @leonlyu1996's solution..
Needing to add a new header to get this to work is a really ugly hack. C'mon, this should be pretty basic functionality available out of the box (and it is available out of the box w/ ingress-nginx, with ASP.NET Core's built-in metrics etc etc). All you really need is something in the docs to say "make damn sure you group your paths and understand their boundedness or you'll make your Prom server sad".
I really do reject this notion that it suddenly becomes a breaking change in the software because a user becomes able to do something dumb with it and cause themselves pain. If you play stupid games, you win stupid prizes. @Place1's suggestion for stripping out 404s is a sensible one; then you just need to have an understanding of the paths your routing framework will accept.
Hello,
Thanks for all the feedback! However, there is still something unclear to me @hick @lol768 @mgerasimchuk @Place1
Do you want the request path or the router's rule path to be added as a label to metrics?
Hi @rtribotte ,
Ideally, we want to have an HTTP Request Path.
But at least for me, it would be also a good alternative if we would be able to have at least Traefik route Rule
Hello, I raise my hand for this feature, especially for the information about the request path. I do not fully understand the problem with the route rule or http request path. IMHO the http request path should be used which is the current one passed to the metrics middleware.