traefik-helm-chart
traefik-helm-chart copied to clipboard
improved support for use with prometheus-operator
Welcome!
- [X] Yes, I've searched similar issues on GitHub and didn't find any.
- [X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.
What did you expect to see?
I thought that this chart would make it easy to integrate Traefik and Prometheus. that may be true, but it turns out that it's not necessarily true if the user is using the prometheus-operator.
speaking personally regarding my experience using Traefik, as provisioned by this chart, and Prometheus, as provisioned by the prometheus-operator, I struggled to get Traefik metrics into Prometheus. while this chart seems to provide some nice features for use with Prometheus. however, those features seem to not apply when using the prometheus-operator.
here's the situation:
-
the prometheus-operator doesn't support the old
prometheus.io/scrape
annotation. -
this chart assumes that the
prometheus.io/
annotations are good enough
from there, I found two approaches to getting Traefik metrics into Prometheus, both of which ultimately target Traefik's :9100/metrics
endpoint:
- define a
PodMonitor
- define a
ServiceMonitor
so then, which way would be best? (I think it'd be nice if users didn't have to ask and answer this question and the other questions that follow from it.)
I found some guidelines from prometheus-operator repo committers that ServiceMonitor
s are "default choice unless you have some reason to use PodMonitor
". likewise, their getting-started doc gives ServiceMonitor
much more attention than PodMonitor
.
if we do want to use a ServiceMonitor
, then we find that the k8s Service
resource created by this chart isn't ready-to-use, because it doesn't expose the metrics ports, 9100
. so, either we have to monkey patch that Service
, define another Service
, or go with PodMonitor
. it seems to me that the easiest thing that maintainers of this chart might do would be to expose the metrics port on the traefik Service
, or otherwise define a metrics-specifc Service
to nudge users towards this seemingly preferable ServiceMonitor
solution.
there's probably an argument here that other Treafik docs could be improved upon, too. I foud these traefik.io write-ups, which didn't seem to speed me much along towards finding the solution:
- the Traefik docs - that page was just about all I could find on prometheus. I found that writing terse and not very clear - for example, I'm still not sure what an "entrypoint" is, in this context.
- this Traefik blog post on "Traefik Metrics for Apps on Kubernetes with Prometheus". I found this post more useful.
edit: I thought I was done writing, but then I discovered another gotcha. the ServiceMonitor
needs to include honorLabels: true
, so that the metrics that Prometheus scrapes from Traefik have the values that Traefik expects at the service
label, rather than the exported_service
label. so the serviceMonitor.spec
needs to look something like this:
spec:
jobLabel: traefik
selector:
# omitted ...
endpoints:
- port: metrics
path: /metrics
honorLabels: true
ultimately, I think the point here is that this chart is nice but that it doesn't actually play well out-of-the-box with a prometheus-operator setup; instead, a lot of stuff falls to the user to figure out. explicit acknowledgement of this, whether by adding prometheus-operator support to the chart or by warning prometheus-operators users that they'll have to figure it out on their own, would be nice.
if we do want to use a ServiceMonitor, then we find that the k8s Service resource created by this chart isn't ready-to-use, because it doesn't expose the metrics ports, 9100. so, either we have to monkey patch that Service, define another Service, or go with PodMonitor. it seems to me that the easiest thing that maintainers of this chart might do would be to expose the metrics port on the traefik Service
https://github.com/traefik/traefik-helm-chart/blob/master/traefik/values.yaml#L349-L355
It should be metric specific Service or PodMonitor. Existing Service isn't suitable because in most configurations it shouldn't expose metrics to public.
Worth noting that this could be resolved by users with #643 (more discussion https://github.com/traefik/traefik-helm-chart/issues/595). The chart could always have more native support for this type of pattern though - I would be in favor of that!
thanks @mloiseleur !