prometheus-kubernetes
prometheus-kubernetes copied to clipboard
kubelet Kubernetes node labels are missing
Hi
I encounter the same issue as described here https://github.com/prometheus/prometheus/issues/3294 when deploying Prometheus
Earlier (before the Prometheus operator implementation) metrics like
container_memory_working_set_bytes{id='/'}
provided all the node labels
But unfortunately now most of the useful labels are missing.
Hi @paalkr, sorry for the late reply. I did try to fix it but unfortunately wasn't able to do it. Tested also with TargetLables
and still not working. I think I'll need more time to understand exactly how this work.
Here is the example I tried:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubelet
labels:
k8s-app: kubelet
spec:
jobLabel: k8s-app
endpoints:
- port: http-metrics
path: /metrics/cadvisor
interval: 30s
honorLabels: true
selector:
matchLabels:
k8s-app: kubelet
targetLabels:
- app
- prometheus
- beta.kubernetes.io/instance-type
- kops.k8s.io/instancegroup
- failure-domain.beta.kubernetes.io/zone
- name
- address
namespaceSelector:
matchNames:
- kube-system
Hi, thanks for looking into this, I appreciate your effort and help.
May this be of any help? https://github.com/coreos/prometheus-operator/blob/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
Hi, id did notice a big commit yesterday. Did you have any success to get the labels applied?
No luck yet
Ah, I see. Thanks for the feedback. anything I can do to help?
Hi
I did a comparison of the Prometheus config for your old release (that was not based on the operator), and the current release. And I do see that the configs generated by the operator (servicemonitors) are very different. I would presume this section (only found in the old config) is the on that scrapes the node labels...
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
Look at the configuration snippets that are responsible for scraping the kubelets on the two versions. Old style (no operator)
- job_name: kubernetes-nodes
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- api_server: null
role: node
namespaces:
names: []
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
action: replace
New style generated by the kubelet ServiceMonitor Operator object
- job_name: monitoring/kubelet/0
honor_labels: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- api_server: null
role: endpoints
namespaces:
names:
- kube-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: cadvisor
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: cadvisor
action: replace
I guess what we need is the
metricRelabelings:
https://github.com/coreos/prometheus-operator/blob/master/example/prometheus-operator-crd/servicemonitor.crd.yaml
Some more interesting information https://github.com/coreos/prometheus-operator/issues/1166 https://github.com/coreos/prometheus-operator/issues/1548#issuecomment-402045553
I managed to sort out this issue. There are several things worth mentioning. The Prometheus Operator ServiceMonitor object/kind/crd does not provide access to the kubernetes_sd_config role=node
kubernetes_sd_configs:
- api_server: null
role: node
And therefor non of the __meta_kubernetes_node_*
labels are available. This can be worked around using additionalScrapeConfigs
in the PrometheusSpec (https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec).
Trying to use targetLabels
in the service monitor will fail, becasue they will translate into
__meta_kubernetes_service_label_<labelname>
as the role of targets created by the automated processes in the operator always are service.
kubernetes_sd_configs:
- api_server: null
role: service
So I ended up removing the kubelet ServiceMonitor (prometheus-k8s-service-monitor-kubelet.yaml), and replacing this with a custom scraping config. To get this to work you have to do several things.
- Alter the rbac config to allow node access a the cluster scope
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: prometheus-k8s
namespace: default
rules:
- apiGroups: [""]
resources:
- nodes/metrics
- nodes
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups: [""]
resources:
- nodes/metrics
- nodes
- endpoints
- pods
- services
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
- Create a yaml file with your custom scraping configs kubelet.yaml
- job_name: 'kubernetes-nodes'
scheme: https
metrics_path: /metrics
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
regex: (.+)
replacement: ${1}:10250
- job_name: 'kubernetes-cadvisor'
scheme: https
metrics_path: /metrics/cadvisor
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: __address__
regex: (.+)
replacement: ${1}:10250
-
Create a secret with content from the custom scraping config yaml
kubectl create secret generic additional-scrape-configs --from-file=kubelet.yaml=.\kubelet.yaml
-
Modify the Prometheus config to include the customs scraping configs
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
externalUrl: xxx
replicas: 2
ruleSelector:
matchLabels:
prometheus: k8s
role: prometheus-rulefiles
serviceAccountName: prometheus-k8s
additionalScrapeConfigs:
name: additional-scrape-configs
key: kubelet.yaml
serviceMonitorSelector:
matchExpressions:
- key: k8s-app
operator: Exists
version: v2.2.1
All node labels are then propagated to metrics like container_memory_working_set_bytes
, machine_memory_bytes
etc.
Hi @paalkr, thanks for this fix. I'll try to integrate it in the next update.
@camilb I'm facing the same issue with the node-exporter
.
with respect to node-exporter o we have any work around for now to get the node name label in the metrics without additionalScrape config ?
with respect to node-exporter o we have any work around for now to get the node name label in the metrics without additionalScrape config ?
bumppppp there has to be a better way to do this
please help me understand that. I am really looking for prometheus operator use name labels on nodes. PUNEET
On Wednesday, 1 July 2020, 10:57:42 PM IST, Blaumer <[email protected]> wrote:
with respect to node-exporter o we have any work around for now to get the node name label in the metrics without additionalScrape config ?
bumppppp there has to be a better way to do this
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I'll take a look on it later today
thanks a lot.. just let me know when you check.. I will try that immediately. PUNEET
On Thursday, 2 July 2020, 07:49:56 AM IST, Camil Blanaru <[email protected]> wrote:
I'll take a look on it later today
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi guys. I have faced with the same issue. Do we have any updates regarding the solution?
I have solved it by used group_left in query. sum by(label_node_type, node) (kube_node_status_allocatable_cpu_cores * on(node) group_left(label_node_type) kube_node_labels)
Hi guys, probably this is a more cleaner way to do this (on the serviceMonitor object)
prometheus-node-exporter:
prometheus:
monitor:
enabled: true
relabelings:
- sourceLabels: [__meta_kubernetes_endpoint_node_name]
targetLabel: node
These values are for kube-prometheus-stack helm chart. Check service discovery on prometheus for more labels. most of the useful ones are available.
Hi. Thanks a lot for your reply. Will check it.
Hello, I'm trying to add a custom label to the node.
- job_name: 'kubernetes-nodes' scheme: https metrics_path: /metrics kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - source_labels: [__meta_kubernetes_node_failure_domain_beta_kubernetes_io_zone] target_label: __failure_domain_beta_kubernetes_io_zone__ regex: (.+) replacement: $1
But it is not working. Please ow can I have this label in my cluster failure-domain.beta.kubernetes.io/zone
The label is at the node level.
I using kube-Prometheus-stack helm chart