click-to-deploy icon indicating copy to clipboard operation
click-to-deploy copied to clipboard

Prometheus setup drops important metrics like `machine_memory_bytes`

Open elnygren opened this issue 4 years ago • 0 comments

Category:

Kubernetes apps

Type:

  • [x] Bug ?
  • [ ] Feature Request
  • [ ] Process

Prometheus config appears to be dropping some events, including machine_memory_bytes that is commonly used in Grafana dashboards when calculating cluster memory:

sum (container_memory_working_set_bytes{id=\"/\",kubernetes_io_hostname=~\"^$Node$\"}) / sum (machine_memory_bytes{kubernetes_io_hostname=~\"^$Node$\"}) * 100

(https://github.com/pivotal-cf/charts-grafana)

I believe the offending rows are the metric_relabel_configs with action: drop in the cadvisor scraper job:

- job_name: cadvisor
  honor_timestamps: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__address__]
    separator: ;
    regex: ([^:]+)(?::\d+)?
    target_label: __address__
    replacement: $1:10255
    action: replace
  - separator: ;
    regex: (.*)
    target_label: __metrics_path__
    replacement: /metrics/cadvisor
    action: replace
  metric_relabel_configs:
  - source_labels: [namespace]
    separator: ;
    regex: ^$
    replacement: $1
    action: drop
  - source_labels: [pod_name]
    separator: ;
    regex: ^$
    replacement: $1
    action: drop

https://github.com/GoogleCloudPlatform/click-to-deploy/blob/b486a7f0959fad504ef984782b30b03c7264a1a6/k8s/prometheus/manifest/prometheus-configmap.yaml#L346

Or is there some other explanation why that metric is missing?

I can confirm that by manually curling the relevant /metric endoints, the machine_memory_bytes is in the response body.

elnygren avatar May 12 '20 11:05 elnygren