helm-charts
helm-charts copied to clipboard
[kube-prometheus-stack] Install with argocd in different namespace cause node-exporter to not be present
Describe the bug a clear and concise description of what the bug is.
When installing the kube-prometheus-stack with ArgoCD in a different namespace than the ArgoCD original namespace, the node-exporter and state-metrics are not working.
I have an ArgoCD instance in a cluster A than runs in the argocd
namespace. In order to separate my applications per environment, I wanted to move all my staging applications to the argocd-staging
namespace. Everything works but the kube-prometheus-stack. In the stack, everything get deployed correctly, but somehow, prometheus can't find node-exporter and kube-state-metrics. The other targets are working (in the Prometheus UI) and I even got data, but not for node-exporter and kube-state-metrics. I found out that moving back my application to the argocd
namespace makes it works. I have tried in the same cluster (monitoring and argocd-staging namespaces in the same cluster) and even in different cluster (argocd in cluster A deploying the stack in cluster B) and both of them have the same behavior.
I also checked the diff in the secrets and configmaps to see if something changed, but only the version number, and some annotations regarding argocd, nothing changed.
I also found out that in the service discovery tab, I should have 6 targets for node-exporter and one for state-metrics, and when they don't appear, there is 7 more undefined non active targets.
It seems that the bug appear also when I change the release name of my helm chart, even in the 'argocd' namespace so I guess it isn't a ArgoCD issue, and also all other targets works fine.
What's your helm version?
ersion.BuildInfo{Version:"v3.10.2", GitCommit:"50f003e5ee8704ec937a756c646870227d7c8b58", GitTreeState:"clean", GoVersion:"go1.18.8"}
What's your kubectl version?
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+rke2r1", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-13T17:54:54Z", GoVersion:"go1.19.2 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
Which chart?
kube-prometheus-stack
What's the chart version?
45.2.0
What happened?
No response
What you expected to happen?
I expected node-exporter and kube-state-metrics targets to be correctly defined, and receive their data.
How to reproduce it?
I did not try on a fresh install but I guess :
- Install argocd in
argocd
namespace (with config to allow to deploy applications to different namespaces) - Deploy
kube-prometheus-stack
in theargocd-staging
namespace - see if node-exporter targets are working
Enter the changed values of values.yaml?
grafana:
persistence:
enabled: true
type: pvc
accessModes: ["ReadWriteOnce"]
size: 4Gi
sidecar:
dashboards:
provider:
allowUiUpdates: true
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ovh-ca-issuer
hosts:
- grafana.xxxxxx.xxx
tls:
- secretName: grafana-tls
hosts:
- grafana.xxxxxx.xxx
grafana.ini:
analytics:
check_for_updates: true
grafana_net:
url: https://grafana.net
log:
mode: console
level: debug
paths:
data: /var/lib/grafana/
logs: /var/log/grafana
plugins: /var/lib/grafana/plugins
provisioning: /etc/grafana/provisioning
server:
domain: grafana.xxxxxx.xxx
root_url: https://grafana.xxxxxx.xxx/
prometheus-node-exporter:
tolerations:
- effect: NoSchedule
operator: Exists
- key: CriticalAddonsOnly
operator: Exists
- effect: NoExecute
operator: Exists
prometheus:
monitor:
relabelings:
- sourceLabels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: ^(.*)$
targetLabel: nodename
replacement: $1
action: replace
prometheus:
ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: ovh-ca-issuer
hosts:
- prometheus.xxxxxx.xxx
tls:
- secretName: prometheus-tls
hosts:
- prometheus.xxxxxx.xxx
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 10Gi
accessModes: ["ReadWriteOnce"]
Enter the command that you execute and failing/misfunctioning.
none
Anything else we need to know?
No response
Can confirm, seeing the same. The /metrics
endpoints are working as expected, but the node-exporter
targets appear as "undefined" in the Service Discovery tab. So the target is working as expected, it's just failing to get picked up by prometheus.
Similar setup to yours:
ArgoCD:
{
"Version": "v2.6.4+7be094f",
"BuildDate": "2023-03-07T22:48:16Z",
"GitCommit": "7be094f38d06859b594b98eb75c7c70d39b80b1e",
"GitTreeState": "clean",
"GoVersion": "go1.18.10",
"Compiler": "gc",
"Platform": "linux/amd64",
"KustomizeVersion": "v4.5.7 2022-08-02T16:35:54Z",
"HelmVersion": "v3.10.3+g835b733",
"KubectlVersion": "v0.24.2",
"JsonnetVersion": "v0.19.1"
}
Helm:version.BuildInfo{Version:"v3.11.3", GitCommit:"66a969e7cc08af2377d055f4e6283c33ee84be33", GitTreeState:"clean", GoVersion:"go1.20.3"}
kubectl: Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:14:41Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v5.0.1 Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3+k3s1", GitCommit:"01ea3ff27be0b04f945179171cec5a8e11a14f7b", GitTreeState:"clean", BuildDate:"2023-03-27T22:23:17Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/amd64"}
Update: I was able to get this working properly by deploying to a namespace called kube-prometheus-stack
. My application YML below for ref. There must be something hard-coded to a namespace name, somehow? Haven't the foggiest idea. I tried tracing back to what could be causing it and was unable to figure it out on my (not great with helm templating) knowledge.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
annotations:
argocd.argoproj.io/sync-wave: "1"
name: kube-prometheus-stack
namespace: argocd
spec:
source:
path: argocd/prod-resources/kube-prometheus-stack
repoURL: ssh://[email protected]/myorg/project.git
targetRevision: main
destination:
namespace: kube-prometheus-stack
server: 'https://kubernetes.default.svc'
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 5s
maxDuration: 3m0s
factor: 2
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Both the solutions, installing the helm chart in those namespaces (argocd
or kube-prometheus-stack
) didn't work for me. But for me the problem is slightly different, the node_exporter targets do not appear in prometheus but I can query all the go_*
and process_*
metrics while all the metrics that start with node_*
are absent.
I've found tough that installing the helm chart directly to the cluster (so not through and Argocd Application) works for me. There must be some thing that argocd does that somehow prevents prometheus from scraping the node_exporter targets.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This problem affects not just for kube-prometheus-stack but rather for ANY helm chart that creates ServiceMonitor resources.
The reason is that ArgoCD is using label app.kubernetes.io/instance
to tack all resources that are managed and can rewrite it.
In case of:
- Argo CD configured to manage Applications in namespaces other than the control plane's namespace,
- ArgoCD configured
application.resourceTrackingMethod: annotation+label
, - Application created in other namespace,
then ArgoCD MODIFY label app.kubernetes.io/instance
on certain resources by adding namespace as a prefix: <namespace>_<application_name>
.
This breaks selectors.
Example:
deploying 'kube-prometheus-stack' chart as Application in 'monitoring' namespace.
ServiceMonitor for prometheus-node-exporter:
$ kubectl get servicemonitor kube-prometheus-stack-prometheus-node-exporter -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
...
labels:
app.kubernetes.io/component: metrics
app.kubernetes.io/instance: monitoring_kube-prometheus-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: prometheus-node-exporter
app.kubernetes.io/part-of: prometheus-node-exporter
helm.sh/chart: prometheus-node-exporter-4.24.0
release: kube-prometheus-stack
name: kube-prometheus-stack-prometheus-node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/instance: kube-prometheus-stack
app.kubernetes.io/name: prometheus-node-exporter
Note the selector.matchLabels - it expects to lookup prometheus-node-exporter service with instance of kube-prometheus-stack
, but it can't lookup any:
$ kubectl get svc -n monitoring -l 'app.kubernetes.io/instance=kube-prometheus-stack,app.kubernetes.io/name=prometheus-node-exporter'
No resources found in monitoring namespace.
Why so ? Well, it is because prometheus-node-exporter service label is not as it expected to be:
$ kubectl get svc -n monitoring kube-prometheus-stack-prometheus-node-exporter -o yaml
apiVersion: v1
kind: Service
metadata:
...
labels:
app.kubernetes.io/component: metrics
app.kubernetes.io/instance: monitoring_kube-prometheus-stack
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: prometheus-node-exporter
app.kubernetes.io/part-of: prometheus-node-exporter
helm.sh/chart: prometheus-node-exporter-4.24.0
release: kube-prometheus-stack
name: kube-prometheus-stack-prometheus-node-exporter
namespace: monitoring
spec:
selector:
app.kubernetes.io/instance: kube-prometheus-stack
app.kubernetes.io/name: prometheus-node-exporter
...
servicemonitor selector expects service label instance monitoring_kube-prometheus-stack
but it is monitoring_kube-prometheus-stack
Helm chart has rendered correct resource labels, but it is ArgoCD who modified them. Not all resource labels are overwritten. I found ServiceMonitor, Service, Deployment overwritten but not ReplicaSet or Pod.
Given this problem nature, I expect many other exporters (those that use similar selector) to be not discovered by prometheus.
I'm not sure it is a bug, rather misconfiguration.
Possible solutions:
- change ArgoCD application.resourceTrackingMethod to
annotation
only - keep using
annotation+label
but set custom application.instanceLabelKey
We fixed the issue on our side by editing argocd's argocd-cm config map and setting application.instanceLabelKey
to argocd.argoproj.io/instance
We fix this issue by remove app.kubernetes.io/instance: kube-prometheus-stack
so the selector is only app.kubernetes.io/name: prometheus-node-exporter
. Here's the reason.
- Changing ArgoCD
application.resourceTrackingMethod
to annotation only will cause all of our apps out of sync which we don't want. The same to setapplication.instanceLabelKey
. Given some of the app is already out of sync, makes it harder to recognize the situation. - We hold the promtheus chart in a inner GitLab repo and we don't need to follow the upstream version so closely.
- Leaving only one selector works well and has only affect on this application
For anyone who may need this.
@chivalryq Thanks ! Removing the app.kubernetes.io/instance: kube-prometheus-stack
from selector.matchLabels from Service Monitor resources i.e prometheus-node-exporter and kube-state-metrics worked without changing any other configuration. Remember to restart the pods afterwards.