click-to-deploy
click-to-deploy copied to clipboard
Prometheus setup drops important metrics like `machine_memory_bytes`
Category:
Kubernetes apps
Type:
- [x] Bug ?
- [ ] Feature Request
- [ ] Process
Prometheus config appears to be dropping some events, including machine_memory_bytes
that is commonly used in Grafana dashboards when calculating cluster memory:
sum (container_memory_working_set_bytes{id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~\"^$Node$\"}) * 100
(https://github.com/pivotal-cf/charts-grafana)
I believe the offending rows are the metric_relabel_configs
with action: drop
in the cadvisor scraper job:
- job_name: cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: node
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__address__]
separator: ;
regex: ([^:]+)(?::\d+)?
target_label: __address__
replacement: $1:10255
action: replace
- separator: ;
regex: (.*)
target_label: __metrics_path__
replacement: /metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [namespace]
separator: ;
regex: ^$
replacement: $1
action: drop
- source_labels: [pod_name]
separator: ;
regex: ^$
replacement: $1
action: drop
https://github.com/GoogleCloudPlatform/click-to-deploy/blob/b486a7f0959fad504ef984782b30b03c7264a1a6/k8s/prometheus/manifest/prometheus-configmap.yaml#L346
Or is there some other explanation why that metric is missing?
I can confirm that by manually curling the relevant /metric endoints, the machine_memory_bytes
is in the response body.