robusta Logs enricher when use prometheus does not show the graph for node utilisation

trafficstars

Describe the bug

Hey, I am using the Robusta some days ago with Prometheus plus Alertmanager but the enricher for prometheus like memory or cpu is empty when fired to teams as alert. Only the graph for memory usage appears in the graph but the node utilisation is not showing.

The image below will demonstrate it

To Reproduce Steps to reproduce the behavior: 1 - Install using the official helm charts 3 - Configure Prometheus and Alertmanager 4 - Configure the SINK to use Teams 5 - In my global env has been configured like below

globalConfig:
  grafana_url: ""
  grafana_api_key: ""
  grafana_dashboard_uid: ""
  alertmanager_url: "http://alertmanager-operated.thanos:9093"
  prometheus_url: "http://thanos-query-frontend.thanos:9090"
  signing_key: ""
  account_id: 695c3053-0e56-xxxxxxxxxxxxxxxxxxxxxx
  custom_annotations: []

5 - My trigger and action is:

- triggers:
  - on_pod_oom_killed:
      rate_limit: 3600
  actions:
  - pod_oom_killer_enricher: {}
  - logs_enricher: {}
  - pod_node_graph_enricher:
      resource_type: Memory
      display_limits: true
  - oomkilled_container_graph_enricher:
      resource_type: Memory
      display_limits: true
  stop: true

Expected behavior

The graph woks for both sides, node utilisation and pod utilisation

Screenshots It was added above

Desktop (please complete the following information):

OS: RedHat 8.5 and Ubunut 20.04LTS
Browser: Chrome
Version: 119

Additional context Add any other context about the problem here.

Nov 14 '23 20:11 antikilahdjs

Yes I to receive the node graph empty.

Nov 24 '23 04:11 saireddyb

Same here on Robusta 0.12.0 without UI integration

May 16 '24 08:05 wrbbz

Same here on Robusta 0.12.0 without UI integration

Same here.

May 16 '24 12:05 Bobses

@Bobses @wrbbz do you see any exception in the robusta-runner pod logs ?

May 16 '24 15:05 arikalon1

Hi all, I believe this is because robusta is using the recording rule instance:node_memory_utilisation:ratio which isn't present in your environment.

If that is the case, we should be able to fix this by replacing instance:node_memory_utilisation:ratio with it's definition or possibly just by container_memory_working_set_bytes{node="${node_name}", container!=""}

May 17 '24 11:05 aantn

To help us get to the bottom of this, can each of you please verify that the metric instance:node_memory_utilisation:ratio is in fact missing from your environment.

May 17 '24 11:05 aantn

Yeah. I can confirm that we do not have instance:node_memory_utilisation:ratio. Only container_memory_working_set_bytes

May 17 '24 15:05 wrbbz

I confirm that we don't have that record.

So, I'll add the following record:

record: instance:node_memory_utilisation:ratio
expr: 1 - (node_memory_MemAvailable_bytes{job="node-exporter"} or (node_memory_Buffers_bytes{job="node-exporter"} + node_memory_Cached_bytes{job="node-exporter"} + node_memory_MemFree_bytes{job="node-exporter"} + node_memory_Slab_bytes{job="node-exporter"} ) / node_memory_MemTotal_bytes{job="node-exporter"})

Thank you!

May 20 '24 09:05 Bobses

Yep, that will fix the problem. (Please confirm!)

I think we should also change this on our side to query the expr instead and not rely on that recording rule.

May 20 '24 09:05 aantn

I've created a PR on usage definitions instead of records

Also, adding record to the Prom instance solved No Data error

May 21 '24 10:05 wrbbz

@wrbbz thank you for the fix. It will be included in the next Robusta release!

Jun 15 '24 19:06 aantn

robusta robusta copied to clipboard

Logs enricher when use prometheus does not show the graph for node utilisation

robusta
robusta copied to clipboard