helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[kube-prometheus-stack] Install with argocd in different namespace cause node-exporter to not be present

Open MikeDevresse opened this issue 1 year ago • 9 comments

Describe the bug a clear and concise description of what the bug is.

When installing the kube-prometheus-stack with ArgoCD in a different namespace than the ArgoCD original namespace, the node-exporter and state-metrics are not working.

I have an ArgoCD instance in a cluster A than runs in the argocd namespace. In order to separate my applications per environment, I wanted to move all my staging applications to the argocd-staging namespace. Everything works but the kube-prometheus-stack. In the stack, everything get deployed correctly, but somehow, prometheus can't find node-exporter and kube-state-metrics. The other targets are working (in the Prometheus UI) and I even got data, but not for node-exporter and kube-state-metrics. I found out that moving back my application to the argocd namespace makes it works. I have tried in the same cluster (monitoring and argocd-staging namespaces in the same cluster) and even in different cluster (argocd in cluster A deploying the stack in cluster B) and both of them have the same behavior.

I also checked the diff in the secrets and configmaps to see if something changed, but only the version number, and some annotations regarding argocd, nothing changed.

I also found out that in the service discovery tab, I should have 6 targets for node-exporter and one for state-metrics, and when they don't appear, there is 7 more undefined non active targets.

It seems that the bug appear also when I change the release name of my helm chart, even in the 'argocd' namespace so I guess it isn't a ArgoCD issue, and also all other targets works fine.

What's your helm version?

ersion.BuildInfo{Version:"v3.10.2", GitCommit:"50f003e5ee8704ec937a756c646870227d7c8b58", GitTreeState:"clean", GoVersion:"go1.18.8"}

What's your kubectl version?

Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+rke2r1", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-13T17:54:54Z", GoVersion:"go1.19.2 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack

What's the chart version?

45.2.0

What happened?

No response

What you expected to happen?

I expected node-exporter and kube-state-metrics targets to be correctly defined, and receive their data.

How to reproduce it?

I did not try on a fresh install but I guess :

  • Install argocd in argocd namespace (with config to allow to deploy applications to different namespaces)
  • Deploy kube-prometheus-stack in the argocd-staging namespace
  • see if node-exporter targets are working

Enter the changed values of values.yaml?

    grafana:
      persistence:
        enabled: true
        type: pvc
        accessModes: ["ReadWriteOnce"]
        size: 4Gi
      sidecar:
        dashboards:
          provider:
            allowUiUpdates: true
      ingress:
        enabled: true
        annotations:
          cert-manager.io/cluster-issuer: ovh-ca-issuer
        hosts:
          - grafana.xxxxxx.xxx
        tls:
          - secretName: grafana-tls
            hosts:
              - grafana.xxxxxx.xxx
      grafana.ini:
        analytics:
          check_for_updates: true
        grafana_net:
          url: https://grafana.net
        log:
          mode: console
          level: debug
        paths:
          data: /var/lib/grafana/
          logs: /var/log/grafana
          plugins: /var/lib/grafana/plugins
          provisioning: /etc/grafana/provisioning
        server:
          domain: grafana.xxxxxx.xxx
          root_url: https://grafana.xxxxxx.xxx/
    prometheus-node-exporter:
      tolerations:
        - effect: NoSchedule
          operator: Exists
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoExecute
          operator: Exists
      prometheus:
        monitor:
          relabelings:
            - sourceLabels: [__meta_kubernetes_pod_node_name]
              separator: ;
              regex: ^(.*)$
              targetLabel: nodename
              replacement: $1
              action: replace
    prometheus:
      ingress:
        enabled: true
        annotations:
          cert-manager.io/cluster-issuer: ovh-ca-issuer
        hosts:
          - prometheus.xxxxxx.xxx
        tls:
          - secretName: prometheus-tls
            hosts:
              - prometheus.xxxxxx.xxx
      prometheusSpec:
        storageSpec: 
          volumeClaimTemplate:
            spec:
              resources:
                requests:
                  storage: 10Gi
              accessModes: ["ReadWriteOnce"]

Enter the command that you execute and failing/misfunctioning.

none

Anything else we need to know?

No response

MikeDevresse avatar Mar 16 '23 08:03 MikeDevresse

Can confirm, seeing the same. The /metrics endpoints are working as expected, but the node-exporter targets appear as "undefined" in the Service Discovery tab. So the target is working as expected, it's just failing to get picked up by prometheus.

Similar setup to yours:

ArgoCD:

{
    "Version": "v2.6.4+7be094f",
    "BuildDate": "2023-03-07T22:48:16Z",
    "GitCommit": "7be094f38d06859b594b98eb75c7c70d39b80b1e",
    "GitTreeState": "clean",
    "GoVersion": "go1.18.10",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v4.5.7 2022-08-02T16:35:54Z",
    "HelmVersion": "v3.10.3+g835b733",
    "KubectlVersion": "v0.24.2",
    "JsonnetVersion": "v0.19.1"
}

Helm:version.BuildInfo{Version:"v3.11.3", GitCommit:"66a969e7cc08af2377d055f4e6283c33ee84be33", GitTreeState:"clean", GoVersion:"go1.20.3"}

kubectl: Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:14:41Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v5.0.1 Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3+k3s1", GitCommit:"01ea3ff27be0b04f945179171cec5a8e11a14f7b", GitTreeState:"clean", BuildDate:"2023-03-27T22:23:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}

SlothCroissant avatar Apr 17 '23 03:04 SlothCroissant

Update: I was able to get this working properly by deploying to a namespace called kube-prometheus-stack. My application YML below for ref. There must be something hard-coded to a namespace name, somehow? Haven't the foggiest idea. I tried tracing back to what could be causing it and was unable to figure it out on my (not great with helm templating) knowledge.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"
  name: kube-prometheus-stack
  namespace: argocd
spec:
  source:
    path: argocd/prod-resources/kube-prometheus-stack
    repoURL: ssh://[email protected]/myorg/project.git
    targetRevision: main
  destination:
    namespace: kube-prometheus-stack
    server: 'https://kubernetes.default.svc'
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m0s
        factor: 2
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true

SlothCroissant avatar Apr 17 '23 04:04 SlothCroissant

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar May 20 '23 12:05 stale[bot]

Both the solutions, installing the helm chart in those namespaces (argocd or kube-prometheus-stack) didn't work for me. But for me the problem is slightly different, the node_exporter targets do not appear in prometheus but I can query all the go_* and process_* metrics while all the metrics that start with node_* are absent.

I've found tough that installing the helm chart directly to the cluster (so not through and Argocd Application) works for me. There must be some thing that argocd does that somehow prevents prometheus from scraping the node_exporter targets.

nabladev avatar Jun 02 '23 16:06 nabladev

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Aug 10 '23 02:08 stale[bot]

This problem affects not just for kube-prometheus-stack but rather for ANY helm chart that creates ServiceMonitor resources.

The reason is that ArgoCD is using label app.kubernetes.io/instance to tack all resources that are managed and can rewrite it.

In case of:

  • Argo CD configured to manage Applications in namespaces other than the control plane's namespace,
  • ArgoCD configured application.resourceTrackingMethod: annotation+label,
  • Application created in other namespace,

then ArgoCD MODIFY label app.kubernetes.io/instance on certain resources by adding namespace as a prefix: <namespace>_<application_name> .

This breaks selectors.

Example:

deploying 'kube-prometheus-stack' chart as Application in 'monitoring' namespace.

ServiceMonitor for prometheus-node-exporter:

$ kubectl get servicemonitor kube-prometheus-stack-prometheus-node-exporter -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  ...
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: monitoring_kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: prometheus-node-exporter
    app.kubernetes.io/part-of: prometheus-node-exporter
    helm.sh/chart: prometheus-node-exporter-4.24.0
    release: kube-prometheus-stack
  name: kube-prometheus-stack-prometheus-node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: kube-prometheus-stack
      app.kubernetes.io/name: prometheus-node-exporter

Note the selector.matchLabels - it expects to lookup prometheus-node-exporter service with instance of kube-prometheus-stack, but it can't lookup any:

$ kubectl get svc -n monitoring -l 'app.kubernetes.io/instance=kube-prometheus-stack,app.kubernetes.io/name=prometheus-node-exporter'
No resources found in monitoring namespace.

Why so ? Well, it is because prometheus-node-exporter service label is not as it expected to be:

$ kubectl get svc -n monitoring kube-prometheus-stack-prometheus-node-exporter -o yaml
apiVersion: v1
kind: Service
metadata:
  ...
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: monitoring_kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: prometheus-node-exporter
    app.kubernetes.io/part-of: prometheus-node-exporter
    helm.sh/chart: prometheus-node-exporter-4.24.0
    release: kube-prometheus-stack
  name: kube-prometheus-stack-prometheus-node-exporter
  namespace: monitoring
spec:
  selector:
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/name: prometheus-node-exporter
  ...

servicemonitor selector expects service label instance monitoring_kube-prometheus-stack but it is monitoring_kube-prometheus-stack

Helm chart has rendered correct resource labels, but it is ArgoCD who modified them. Not all resource labels are overwritten. I found ServiceMonitor, Service, Deployment overwritten but not ReplicaSet or Pod.

Given this problem nature, I expect many other exporters (those that use similar selector) to be not discovered by prometheus.

I'm not sure it is a bug, rather misconfiguration.

Possible solutions:

  • change ArgoCD application.resourceTrackingMethod to annotation only
  • keep using annotation+label but set custom application.instanceLabelKey

anatolijd avatar Dec 20 '23 11:12 anatolijd

We fixed the issue on our side by editing argocd's argocd-cm config map and setting application.instanceLabelKey to argocd.argoproj.io/instance

MikeDevresse avatar Feb 01 '24 06:02 MikeDevresse

We fix this issue by remove app.kubernetes.io/instance: kube-prometheus-stack so the selector is only app.kubernetes.io/name: prometheus-node-exporter. Here's the reason.

  1. Changing ArgoCD application.resourceTrackingMethod to annotation only will cause all of our apps out of sync which we don't want. The same to set application.instanceLabelKey. Given some of the app is already out of sync, makes it harder to recognize the situation.
  2. We hold the promtheus chart in a inner GitLab repo and we don't need to follow the upstream version so closely.
  3. Leaving only one selector works well and has only affect on this application

For anyone who may need this.

chivalryq avatar Feb 06 '24 03:02 chivalryq

@chivalryq Thanks ! Removing the app.kubernetes.io/instance: kube-prometheus-stack from selector.matchLabels from Service Monitor resources i.e prometheus-node-exporter and kube-state-metrics worked without changing any other configuration. Remember to restart the pods afterwards.

GGSHAH avatar May 16 '24 10:05 GGSHAH