popeye icon indicating copy to clipboard operation
popeye copied to clipboard

Metrics get overridden when using the same Pushgateway for multiple k8s clusters

Open OllowainT opened this issue 2 years ago • 0 comments




Describe the bug With the pretty cool feature of exporting the generated report in Prometheus output format, we are able to send the metrics to a Prometheus instance via a pushgateway. The pushgateway needs to be defined as target to Scrape by Prometheus. When running popeye as CronJob in a k8s cluster, it sends the metrics to the pushgateway and then it gets scraped by Prometheus in a defined time interval. It seems that no instance is defined when sending the metrics to the pushgateway. See screenshot 1

This ends in a metrics Override when the Job is started in another cluster. See screenshots 2+3

The Problem here is that when using many popeye jobs in different k8s cluster that send their metrics to the same Pushgateway some metrics will be lost. A workaround would be to install a Pushgateway per cluster, but this seems to be no good solution because of much higher resource usage.

To Reproduce Steps to reproduce the behavior:

Define a CronJob in a k8s cluster A and afterwards in k8s cluster B. Use the same pushgateway address

apiVersion: batch/v1
kind: CronJob
metadata:
  name: popeye-{{ .Values.global.clusterName }}
  namespace: {{ .Values.popeye.namespace }}
  labels:
    app: popeye-{{ .Values.global.clusterName }}
spec:
  schedule: "{{ .Values.popeye.cronjob.shedule }}"
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        metadata:
        spec:
          serviceAccountName: {{ .Values.popeye.accountname }}
          restartPolicy: Never
          containers:
          - name: popeye-metrics
            image: "{{ .Values.popeye.image.name }}:{{ .Values.popeye.image.tag }}"
            ports:
            - containerPort: 9217
              protocol: TCP
            imagePullPolicy: IfNotPresent
            command: ["/bin/popeye"]
            args:
              - --cluster-name
              - {{.Values.global.clusterName}}
              - -f
              - /etc/config/popeye/spinach.yml
              - --force-exit-zero=true
              - --all-namespaces
              - -o
              - prometheus
              - --pushgateway-address
              - {{ .Values.global.pushgateway }}
            resources:
              limits:
                memory: 96Mi
              requests:
                cpu: 100m
                memory: 32Mi
            volumeMounts:
              - name: spinach
                mountPath: /etc/config/popeye
          volumes:
            - name: spinach
              configMap:
                name: popeye
                items:
                  - key: spinach
                    path: spinach.yml

Expected behavior Metrics will be not overridden when using the same Pushgateway for multiple k8s clusters. Usage of the Identifier "instance" would solve the issue. The popeye cluster-name parameter could be defined as "instance", when sending the metrics. See https://github.com/prometheus/pushgateway

Push something more complex into the group identified by {job="some_job",instance="some_instance"}:

Screenshots image

image Here the Override can be seen in Grafana also: image

Versions (please complete the following information):

  • OS: -
  • Popeye v0.10.1
  • K8s 1.22

Would be nice if this bug will be fixed.

Thank you very much

OllowainT avatar Aug 22 '22 11:08 OllowainT