postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Using pgo metrics with existing prometheus and grafana

Open alrooney opened this issue 4 years ago • 9 comments

I'm using pgo v4.5.0 and have deployed my pg pods with the --metrics flag. If I already have grafana and prometheus running in my cluster how can I see the metrics coming from pg pods? How can I install the pg dashboards that you have in your grafana instance? Do you have a link to those? Thanks!

alrooney avatar Oct 14 '20 02:10 alrooney

We should probably better document how to connect this to one's existing Prometheus / Grafana. If you are able to get this this working and would like to propose a patch (or a workflow that you used) I'd be happy to review it.

jkatz avatar Oct 14 '20 12:10 jkatz

Further to this, I'm trying to create a PodMonitor for an existing Prometheus installation and it requires a named port on the Collect container which I cannot seem to find how to create without editing the running Pod definition (!) - we have had it running by doing just that but it's obviously an anti-pattern. I suspect the non-Ansible kustomize deployer that @jkatz is working on may be the answer to this but please confirm.

philipgeraldtaylor avatar Oct 27 '20 20:10 philipgeraldtaylor

Hi, I've created a working PodMonitor for an existing Prometheus.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: crunchy-postgres-exporter
  labels:
    release: prometheus-operator
  namespace: cattle-monitoring-system
spec:
  namespaceSelector:
      matchNames:           
      - cluster-postgres  
  selector:
    matchLabels:
      crunchy_postgres_exporter: "true"
  podTargetLabels: 
    - deployment_name
    - role
    - pg_cluster
  podMetricsEndpoints: 
    - relabelings: 
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "5432"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "8009"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "2022"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "10000"
      - sourceLabels: 
        - "__meta_kubernetes_namespace"
        action: "replace"
        targetLabel: "kubernetes_namespace"
      - sourceLabels: 
        - "__meta_kubernetes_pod_name"
        targetLabel: "pod"
      - sourceLabels: 
        - "__meta_kubernetes_pod_ip"
        targetLabel: "ip"
        replacement: "$1"
      - sourceLabels: 
        - "dbname"
        targetLabel: "dbname"
        replacement: "$1"
      - sourceLabels: 
        - "relname"
        targetLabel: "relname"
        replacement: "$1"
      - sourceLabels: 
        - "schemaname"
        targetLabel: "schemaname"
        replacement: "$1"
      - targetLabel: "exp_type"
        replacement: "pg"

zposloncec avatar Nov 04 '20 13:11 zposloncec

@zposloncec when I used your PodMonitor the label clusters which is in form of {{ namespace }}:{{ pg_cluster }} is only translated to {{ pg_cluster }}. Do you have any suggestion on how to keep the original format? Thank you.

captainjapeng avatar Dec 05 '20 07:12 captainjapeng

Hi,

I've created a service monitor from: https://github.com/CrunchyData/pgmonitor/blob/master/prometheus/crunchy-prometheus.yml.containers

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-servicemonitor
  namespace: monitoring
  labels:
    app: prometheus
spec:
  endpoints:
    - port: postgres-exporter
      path: /metrics
      honorLabels: true
      interval: 10s
      relabelings:
      - sourceLabels: [ __meta_kubernetes_pod_label_crunchy_postgres_exporter ]
        action: keep
        regex: "true"
      - sourceLabels: [ __meta_kubernetes_pod_container_port_number ]
        action: drop
        regex: "5432"
      - sourceLabels: [ __meta_kubernetes_pod_container_port_number ]
        action: drop
        regex: "10000"
      - sourceLabels: [ __meta_kubernetes_pod_container_port_number ]
        action: drop
        regex: "8009"
      - sourceLabels: [ __meta_kubernetes_pod_container_port_number ]
        action: drop
        regex: "2022"
      - sourceLabels: [ __meta_kubernetes_namespace ]
        action: replace
        targetLabel: kubernetes_namespace
      - sourceLabels: [ __meta_kubernetes_pod_name ]
        targetLabel: pod
      - sourceLabels: [ __meta_kubernetes_namespace,__meta_kubernetes_pod_label_pg_cluster ]
        targetLabel: pg_cluster
        separator: ':'
        replacement: '$1$2'
      - sourceLabels: [ __meta_kubernetes_pod_ip ]
        targetLabel: ip
        replacement: '$1'
      - sourceLabels: [ __meta_kubernetes_pod_label_deployment_name ]
        targetLabel: deployment
        replacement: '$1'
      - sourceLabels: [ __meta_kubernetes_pod_label_role ]
        targetLabel: role
        replacement: '$1'
      - sourceLabels: [ dbname ]
        targetLabel: dbname
        replacement: '$1'
      - sourceLabels: [ relname ]
        targetLabel: relname
        replacement: '$1'
      - sourceLabels: [ schemaname ]
        targetLabel: schemaname
        replacement: '$1'
      - targetLabel: exp_type
        replacement: 'pg'
  namespaceSelector:
    matchNames:
      - monitoring
  selector:
    matchLabels:
      vendor: crunchydata

mariusstaicu avatar Mar 17 '21 15:03 mariusstaicu

Little input if it can help someone. (for the V5.0.1)

First, Thanks @mariusstaicu , I reused your ServiceMonitor.

Second, on my side, it was not totally enough. I got the entry in the service discovery, but nothing on the target side. To fix that, I needed to add a service with the exporter port and the selector aiming for the exporter:

   apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2021-08-23T04:25:23Z"
  labels:
    postgres-operator.crunchydata.com/crunchy-postgres-exporter: "true"
  name: pgo-exporter
  namespace: pgo
spec:
  clusterIP: None
  selector:
    postgres-operator.crunchydata.com/crunchy-postgres-exporter: "true"
  ports:
  - name: postgres
    port: 9187
    protocol: TCP
    targetPort: 9187
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Of course, you will need to change the selector.matchlabels of the service monitor.

It was the last piece to integrate the monitoring to my kube-prometheus stack. bonus: the json service-monitor that you import in your main.jsonnet of kube-prometheus (based on the reply of @mariusstaicu):

{
    "apiVersion": "monitoring.coreos.com/v1",
    "kind": "ServiceMonitor",
    "metadata": {
        "name": "postgres-servicemonitor",
        "namespace": "monitoring",
        "labels": {
            "app": "prometheus"
        }
    },
    "spec": {
        "endpoints": [
            {
                "port": "postgres",
                "path": "/metrics",
                "honorLabels": true,
                "interval": "10s",
                "relabelings": [
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_crunchy_postgres_exporter"
                        ],
                        "action": "keep",
                        "regex": "true"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_container_port_number"
                        ],
                        "action": "drop",
                        "regex": "5432"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_container_port_number"
                        ],
                        "action": "drop",
                        "regex": "^$"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_namespace"
                        ],
                        "action": "replace",
                        "targetLabel": "kubernetes_namespace"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_name"
                        ],
                        "targetLabel": "pod"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_namespace",
                            "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_cluster"
                        ],
                        "targetLabel": "pg_cluster",
                        "separator": ":",
                        "replacement": "$1$2"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_ip"
                        ],
                        "targetLabel": "ip",
                        "replacement": "$1"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_instance"
                        ],
                        "targetLabel": "deployment",
                        "replacement": "$1"
                    },
                    {
                        "sourceLabels": [
                            "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_role"
                        ],
                        "targetLabel": "role",
                        "replacement": "$1"
                    },
                    {
                        "sourceLabels": [
                            "dbname"
                        ],
                        "targetLabel": "dbname",
                        "replacement": "$1"
                    },
                    {
                        "sourceLabels": [
                            "relname"
                        ],
                        "targetLabel": "relname",
                        "replacement": "$1"
                    },
                    {
                        "sourceLabels": [
                            "schemaname"
                        ],
                        "targetLabel": "schemaname",
                        "replacement": "$1"
                    }
                ]
            }
        ],
        "namespaceSelector": {
            "matchNames": [
                "pgo"
            ]
        },
        "selector": {
          "matchLabels": {
            "postgres-operator.crunchydata.com/crunchy-postgres-exporter": "true"
          }
        }
    }
}

ludzzz avatar Aug 23 '21 08:08 ludzzz

@jkatz

* Prometheus: https://github.com/CrunchyData/pgmonitor/blob/master/prometheus/crunchy-prometheus.yml.containers

* Grafana: https://github.com/CrunchyData/pgmonitor/tree/master/grafana/containers

* Alertmanager: https://github.com/CrunchyData/pgmonitor/blob/master/prometheus/alert-rules.d/crunchy-alert-rules-pg.yml.containers.example

We should probably better document how to connect this to one's existing Prometheus / Grafana. If you are able to get this this working and would like to propose a patch (or a workflow that you used) I'd be happy to review it.

Hi there!

I hope it's okay to ask this question here. I do have the same "problem" as the person creating the issue in the first place. We have our own monitoring stack (prometheus, alertmanager, grafana) and I would like to use Crunchy PGO together with our monitoring stack. I already checked out the exporter and confirmed that the metrics are available. So, once a cluster is up with the exporter sidecars running, what needs to be done to connect those apart from setting up a ServiceMonitor which picks up on the exporter? Is this use case supported and just not documented or will have to bend things quite a bit in order to get meaningful metrics etc.?

Thanks in advance!

dimitrigraf avatar Sep 29 '21 15:09 dimitrigraf

I don't know if a ServiceMonitor is necessary. This is an update of a PodMonitor spec that works based off of @zposloncec earlier post.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: crunchy-postgres-exporter
  namespace: postgres-operator
spec:
  namespaceSelector:
    matchNames:           
    - postgres-operator
  selector:
    matchLabels:
      postgres-operator.crunchydata.com/crunchy-postgres-exporter: "true"
  podTargetLabels: 
    - deployment
    - role
    - pg_cluster
  podMetricsEndpoints: 
    - port: exporter
      path: /metrics
      honorLabels: true
      interval: 10s
      relabelings: 
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "5432"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "8009"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "2022"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "10000"
      - sourceLabels: 
        - "__meta_kubernetes_namespace"
        action: "replace"
        targetLabel: "kubernetes_namespace"
      - sourceLabels: 
        - "__meta_kubernetes_pod_name"
        targetLabel: "pod"
      - sourceLabels:
        - "__meta_kubernetes_namespace"
        - "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_cluster"
        targetLabel: "pg_cluster"
        separator: ':'
        replacement: '$1$2'
      - sourceLabels: 
        - "__meta_kubernetes_pod_ip"
        targetLabel: "ip"
        replacement: "$1"
      - sourceLabels:
        - "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_instance"
        targetLabel: "deployment"
        replacement: '$1'
      - sourceLabels:
        - "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_role"
        targetLabel: "role"
        replacement: '$1'
      - sourceLabels: 
        - "dbname"
        targetLabel: "dbname"
        replacement: "$1"
      - sourceLabels: 
        - "relname"
        targetLabel: "relname"
        replacement: "$1"
      - sourceLabels: 
        - "schemaname"
        targetLabel: "schemaname"
        replacement: "$1"
      - targetLabel: "exp_type"
        replacement: "pg"

This spec assumes that the postgres objects are stored in the prometheus-operator namespace and this has updates to match labels in the latest version of the operator, I believe. I used kustomize/monitoring/prometheus-config.yaml in the https://github.com/CrunchyData/postgres-operator-examples repo and prometheus/containers/crunchy-prometheus.yml.containers in pgmonitor as a basis for the labels. It appears to me that there's some labels in the prometheus configs that are used as sources for backwards compatibility (i.e. what's deployment_name?). I'm not sure that the podTargetLabels here is necessary as every target label specified in the relabelings appeared in the grafana metrics browser regardless of whether they were specified as a podTargetLabel.

I imported the dashboards in kustomize/monitoring/dashboards in the examples repo but needed to replace references to the "PROMETHEUS" datasource with "Prometheus" to match the default datasource that kube-prometheus-stack sets up.

You may need to store the PodMonitor in the namespace where your prometheus-stack is installed (i.e. monitoring instead of prometheus-operator at line 5) if you don't have target discovery configured to check all namespaces.

bdols avatar Jun 17 '22 05:06 bdols

I don't know if a ServiceMonitor is necessary. This is an update of a PodMonitor spec that works based off of @zposloncec earlier post.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: crunchy-postgres-exporter
  namespace: postgres-operator
spec:
  namespaceSelector:
    matchNames:           
    - postgres-operator
  selector:
    matchLabels:
      postgres-operator.crunchydata.com/crunchy-postgres-exporter: "true"
  podTargetLabels: 
    - deployment
    - role
    - pg_cluster
  podMetricsEndpoints: 
    - port: exporter
      path: /metrics
      honorLabels: true
      interval: 10s
      relabelings: 
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "5432"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "8009"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "2022"
      - sourceLabels: 
        - "__meta_kubernetes_pod_container_port_number"
        action: "drop"
        regex: "10000"
      - sourceLabels: 
        - "__meta_kubernetes_namespace"
        action: "replace"
        targetLabel: "kubernetes_namespace"
      - sourceLabels: 
        - "__meta_kubernetes_pod_name"
        targetLabel: "pod"
      - sourceLabels:
        - "__meta_kubernetes_namespace"
        - "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_cluster"
        targetLabel: "pg_cluster"
        separator: ':'
        replacement: '$1$2'
      - sourceLabels: 
        - "__meta_kubernetes_pod_ip"
        targetLabel: "ip"
        replacement: "$1"
      - sourceLabels:
        - "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_instance"
        targetLabel: "deployment"
        replacement: '$1'
      - sourceLabels:
        - "__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_role"
        targetLabel: "role"
        replacement: '$1'
      - sourceLabels: 
        - "dbname"
        targetLabel: "dbname"
        replacement: "$1"
      - sourceLabels: 
        - "relname"
        targetLabel: "relname"
        replacement: "$1"
      - sourceLabels: 
        - "schemaname"
        targetLabel: "schemaname"
        replacement: "$1"
      - targetLabel: "exp_type"
        replacement: "pg"

This spec assumes that the postgres objects are stored in the prometheus-operator namespace and this has updates to match labels in the latest version of the operator, I believe. I used kustomize/monitoring/prometheus-config.yaml in the https://github.com/CrunchyData/postgres-operator-examples repo and prometheus/containers/crunchy-prometheus.yml.containers in pgmonitor as a basis for the labels. It appears to me that there's some labels in the prometheus configs that are used as sources for backwards compatibility (i.e. what's deployment_name?). I'm not sure that the podTargetLabels here is necessary as every target label specified in the relabelings appeared in the grafana metrics browser regardless of whether they were specified as a podTargetLabel.

I imported the dashboards in kustomize/monitoring/dashboards in the examples repo but needed to replace references to the "PROMETHEUS" datasource with "Prometheus" to match the default datasource that kube-prometheus-stack sets up.

You may need to store the PodMonitor in the namespace where your prometheus-stack is installed (i.e. monitoring instead of prometheus-operator at line 5) if you don't have target discovery configured to check all namespaces.

Thanks for the snippet @bdols . It works for me, been trying with the service monitor but keep fail to get the target available. Thanks!!

githilman avatar Jun 17 '22 10:06 githilman

Hi @alrooney,

With the release of PGO v5, we are now re-evaluating various GitHub issues, feature requests, etc. from previous PGO v4 releases. More specifically, in order to properly prioritize the features and fixes needed by our end-users for upcoming PGO v5 releases, we are now attempting to identify which PGO v4 issues, use cases, and enhancement requests are still valid (especially in the context of PGO v5).

Therefore, please let us know if you are still experiencing this issue in PGO v5.

If so, you can either respond to this issue directly to ensure it remains open, or you can close this issue and submit a new one for PGO v5 (this would also be a great opportunity to include any updated details, context, or any other information relevant to your issue). Otherwise, we will be closing this issue in the next 30 days.

If you are still running PGO v4, we recommend that you upgrade to PGO v5 as soon as possible to ensure you have the latest PGO features and functionality.

ValClarkson avatar Oct 19 '22 15:10 ValClarkson

any news here? i have found this blog post: https://www.crunchydata.com/blog/monitoring-postgresql-clusters-in-kubernetes

but i cannot access the ansible playbooks.

i understand after having deployed the exporter sidecar containers one has to:

  • cretae a serviceMonitor for prometheus (in the existing prometheus namespace i assume)
  • add a dashboard template in grafana

any idea how to achieve this with PGO v5 and existing installations of prom. + grafana (in different namespaces)?

dberardo-com avatar Oct 20 '22 17:10 dberardo-com

@dberardo-com That blog post is a little old -- if you're starting fresh now, you probably want to install PGO v5 (5.2.0 is the latest). And for that, we have a tutorial about enabling metrics using our metric stack (I know you want to use your own, that's fine, but let's just start this with documentation):

docs: https://access.crunchydata.com/documentation/postgres-operator/v5/tutorial/monitoring/ monitoring yaml (for reference): https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring

The usual way this would work is that you would start a pg cluster in a namespace and then start this monitoring stack in the same namespace, which does two things for us:

  • the prometheus is set up to discover the metric endpoint in the sidecar;
  • and all of the dashboards in the dashboards folder get merged into a configmap, which then gets mounted in the grafana pod here, which just takes advantage of how Grafana does things (see here).

OK, so how do we use that info to set this up?

(a) add exporter sidecar to the pg cluster, as you noted; (b) can create those dashboards in grafana by either adding them as a configmap which then gets mounted to the grafana pod at the path where Grafana expects dashboards (or where you have it set to expect dashboards) -- or you could just add those json dashboards by hand. (Note: if you do that, I hope your grafana is backed by something more than an ephemeral filesystem, because if the pod dies, then all your hand-loaded dashboards go away and you have to do that again. Which is why I like doing it as a configmap. But I don't know your system, so not sure what's easier. ); (c) and create some service discovery mechanism whereby prometheus in namespace X can get metrics from pg cluster in namespace Y (assuming you don't have namespaces locked down).

That part (c) is covered above in the discussion of serviceMonitor and podMonitor. (I recommend podMonitor so you don't need to create a Service.) I think the answers above should give a good headstart on that, but the Prometheus docs here are also pretty helpful.

benjaminjb avatar Oct 21 '22 02:10 benjaminjb

Hi, Could you please advise how did you fix the issue with not working dashboards? I'm able to get metrics using helm installation and instead of whole installation. I just enable metrics and needed image in values.yml and afterwards I created PodMonitor to scrap the data but the issue which I encountered is that I don't have metrics like __meta_kubernetes_pod_label_postgres_operator_crunchydata_com_role so I can't make it work on dashboard properly.

Thanks for advice!

KMikkey avatar Jan 12 '24 15:01 KMikkey