approver-policy
approver-policy copied to clipboard
feat: fix app label of metrics svc for ServiceMonitor discovery
The ServiceMonitor targets both the web hook service and the metrics service. Yet, only the metrics service must be scraped (the probe could be scraped too via the blackbox-exporter, but that is a subject for another PR). The webhook service must not be discovered by Prometheus.
Prerequisites
Having a Kubernetes cluster running with the cert-manager/approver-policy chart installed and a prometheus instance ready to scrape metrics.
How to reproduce
- Make your Prometheus instance is scrape your approver-policy metrics. For this purpose, set the following values:
nameOverride: cert-manager-approver-policy
app:
metrics:
port: 9402
service:
enabled: true
servicemonitor:
enabled: true
- Open the Prometheus UI, you should see something similar:
I have 2 replicas, hence the (4/4 up) in the target, but the point is that for a given replica, there is currently 2 services scraped: the webhook Service and the metrics Service. The metrics service should obviously be scraped, but not the webhook service (not even listening on port 9402
but on port 443
) as it is not serving any metrics.
That is happening because the ServiceMonitor here is scraping on the following app label:
spec:
selector:
matchLabels:
app: {{ include "cert-manager-approver-policy.name" . }}
Yet, both the metrics Service and the webhook Service have this label, check here and here.
How to test
By changing the app
label of the metrics Service and setting the ServiceMonitor to select the new same app
label, only the metrics Service gets scraped and you should see the following in your Prometheus UI (2/2 up) for 2 replicas:
Simply apply the following JSON Patches to your chart installed in namespace cert-manager
(I am using Helmfile so this is Helmfile syntax, but any tool applying the following JSON Patches obviously works):
jsonPatches:
# The ServiceMonitor currently targets both the metrics endpoint and the webhook endpoint
- target:
group: ""
version: v1
kind: Service
name: cert-manager-approver-policy-metrics
namespace: cert-manager
patch:
- op: replace
path: "/metadata/labels/app"
value: cert-manager-approver-policy-metrics
- target:
group: monitoring.coreos.com
version: v1
kind: ServiceMonitor
name: cert-manager-approver-policy
namespace: cert-manager
patch:
- op: replace
path: "/spec/selector/matchLabels/app"
value: cert-manager-approver-policy-metrics
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: leotomas837 Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
Hi @leotomas837. Thanks for your PR.
I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@wallrj
Instructions added
@wallrj
Did you get the chance to have a look on this ? I should be straight forward. Let me know if you need anything else from me.
@leotomas837 Sorry for the delay. After a few failed attempts to get the Prometheus operator installed I finally got it working this morning and was able to recreate the problem you described.
As you can probably tell, I don't know much about Prometheus, but it seemed strange to me that the ServiceMonitor should show the duplicate endpoints, despite us specifying targetPort: 9402
. Surely, the prometheus operator should be able to see that only one of the matching services targets that port?
So I tried changing targetPort: 9402
to port: metrics
, and that seems to result in only the metrics target appearing in Prometheus
$ git diff
diff --git a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
index 43f1751..2e72caa 100644
--- a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
+++ b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
@@ -20,7 +20,7 @@ spec:
matchNames:
- {{ .Release.Namespace }}
endpoints:
- - targetPort: {{ .Values.app.metrics.port }}
+ - port: metrics
path: "/metrics"
interval: {{ .Values.app.metrics.service.servicemonitor.interval }}
scrapeTimeout: {{ .Values.app.metrics.service.servicemonitor.scrapeTimeout }}
What do you think? Are there any downsides to that approach?
More discussion here about confusing behaviour of targetPort
:
- https://github.com/prometheus-operator/prometheus-operator/issues/2515
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Supplanted by #471