approver-policy icon indicating copy to clipboard operation
approver-policy copied to clipboard

feat: fix app label of metrics svc for ServiceMonitor discovery

Open leotomas837 opened this issue 1 year ago • 7 comments

The ServiceMonitor targets both the web hook service and the metrics service. Yet, only the metrics service must be scraped (the probe could be scraped too via the blackbox-exporter, but that is a subject for another PR). The webhook service must not be discovered by Prometheus.

Prerequisites

Having a Kubernetes cluster running with the cert-manager/approver-policy chart installed and a prometheus instance ready to scrape metrics.

How to reproduce

  1. Make your Prometheus instance is scrape your approver-policy metrics. For this purpose, set the following values:
nameOverride: cert-manager-approver-policy

app:
  metrics:
    port: 9402
    service:
      enabled: true
      servicemonitor:
        enabled: true
  1. Open the Prometheus UI, you should see something similar:

Screenshot_2023-04-28_at_06_11_51-3-2

I have 2 replicas, hence the (4/4 up) in the target, but the point is that for a given replica, there is currently 2 services scraped: the webhook Service and the metrics Service. The metrics service should obviously be scraped, but not the webhook service (not even listening on port 9402 but on port 443) as it is not serving any metrics.

That is happening because the ServiceMonitor here is scraping on the following app label:

spec:
  selector:
    matchLabels:
      app: {{ include "cert-manager-approver-policy.name" . }}

Yet, both the metrics Service and the webhook Service have this label, check here and here.

How to test

By changing the app label of the metrics Service and setting the ServiceMonitor to select the new same app label, only the metrics Service gets scraped and you should see the following in your Prometheus UI (2/2 up) for 2 replicas:

Screenshot_2023-04-28_at_06_48_13

Simply apply the following JSON Patches to your chart installed in namespace cert-manager (I am using Helmfile so this is Helmfile syntax, but any tool applying the following JSON Patches obviously works):

jsonPatches:
  # The ServiceMonitor currently targets both the metrics endpoint and the webhook endpoint
  - target:
      group: ""
      version: v1
      kind: Service
      name: cert-manager-approver-policy-metrics
      namespace: cert-manager
    patch:
      - op: replace
        path: "/metadata/labels/app"
        value: cert-manager-approver-policy-metrics
  - target:
      group: monitoring.coreos.com
      version: v1
      kind: ServiceMonitor
      name: cert-manager-approver-policy
      namespace: cert-manager
    patch:
      - op: replace
        path: "/spec/selector/matchLabels/app"
        value: cert-manager-approver-policy-metrics

leotomas837 avatar Apr 26 '23 13:04 leotomas837

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: leotomas837 Once this PR has been reviewed and has the lgtm label, please assign sgtcodfish for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

jetstack-bot avatar Apr 26 '23 13:04 jetstack-bot

Hi @leotomas837. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jetstack-bot avatar Apr 26 '23 13:04 jetstack-bot

@wallrj

Instructions added

leotomas837 avatar Apr 28 '23 05:04 leotomas837

@wallrj

Did you get the chance to have a look on this ? I should be straight forward. Let me know if you need anything else from me.

leotomas837 avatar May 05 '23 18:05 leotomas837

@leotomas837 Sorry for the delay. After a few failed attempts to get the Prometheus operator installed I finally got it working this morning and was able to recreate the problem you described.

As you can probably tell, I don't know much about Prometheus, but it seemed strange to me that the ServiceMonitor should show the duplicate endpoints, despite us specifying targetPort: 9402. Surely, the prometheus operator should be able to see that only one of the matching services targets that port?

So I tried changing targetPort: 9402 to port: metrics, and that seems to result in only the metrics target appearing in Prometheus

image

$ git diff
diff --git a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
index 43f1751..2e72caa 100644
--- a/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
+++ b/deploy/charts/approver-policy/templates/metrics-servicemonitor.yaml
@@ -20,7 +20,7 @@ spec:
     matchNames:
       - {{ .Release.Namespace }}
   endpoints:
-  - targetPort: {{ .Values.app.metrics.port }}
+  - port: metrics
     path: "/metrics"
     interval: {{ .Values.app.metrics.service.servicemonitor.interval }}
     scrapeTimeout: {{ .Values.app.metrics.service.servicemonitor.scrapeTimeout }}

What do you think? Are there any downsides to that approach?

wallrj avatar May 10 '23 09:05 wallrj

More discussion here about confusing behaviour of targetPort :

  • https://github.com/prometheus-operator/prometheus-operator/issues/2515

wallrj avatar May 10 '23 10:05 wallrj

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jetstack-bot avatar Feb 27 '24 02:02 jetstack-bot

Supplanted by #471

wallrj avatar Jul 25 '24 11:07 wallrj