robusta icon indicating copy to clipboard operation
robusta copied to clipboard

Logs enricher when use prometheus does not show the graph for node utilisation

Open antikilahdjs opened this issue 1 year ago • 10 comments

Describe the bug

Hey, I am using the Robusta some days ago with Prometheus plus Alertmanager but the enricher for prometheus like memory or cpu is empty when fired to teams as alert. Only the graph for memory usage appears in the graph but the node utilisation is not showing.

The image below will demonstrate it

image

To Reproduce Steps to reproduce the behavior: 1 - Install using the official helm charts 3 - Configure Prometheus and Alertmanager 4 - Configure the SINK to use Teams 5 - In my global env has been configured like below

globalConfig:
  grafana_url: ""
  grafana_api_key: ""
  grafana_dashboard_uid: ""
  alertmanager_url: "http://alertmanager-operated.thanos:9093"
  prometheus_url: "http://thanos-query-frontend.thanos:9090"
  signing_key: ""
  account_id: 695c3053-0e56-xxxxxxxxxxxxxxxxxxxxxx
  custom_annotations: []

5 - My trigger and action is:

- triggers:
  - on_pod_oom_killed:
      rate_limit: 3600
  actions:
  - pod_oom_killer_enricher: {}
  - logs_enricher: {}
  - pod_node_graph_enricher:
      resource_type: Memory
      display_limits: true
  - oomkilled_container_graph_enricher:
      resource_type: Memory
      display_limits: true
  stop: true

Expected behavior

The graph woks for both sides, node utilisation and pod utilisation

Screenshots It was added above

Desktop (please complete the following information):

  • OS: RedHat 8.5 and Ubunut 20.04LTS
  • Browser: Chrome
  • Version: 119

Additional context Add any other context about the problem here.

antikilahdjs avatar Nov 14 '23 20:11 antikilahdjs

Yes I to receive the node graph empty.

saireddyb avatar Nov 24 '23 04:11 saireddyb

Same here on Robusta 0.12.0 without UI integration

wrbbz avatar May 16 '24 08:05 wrbbz

Same here on Robusta 0.12.0 without UI integration

Same here.

Bobses avatar May 16 '24 12:05 Bobses

@Bobses @wrbbz do you see any exception in the robusta-runner pod logs ?

arikalon1 avatar May 16 '24 15:05 arikalon1

Hi all, I believe this is because robusta is using the recording rule instance:node_memory_utilisation:ratio which isn't present in your environment.

If that is the case, we should be able to fix this by replacing instance:node_memory_utilisation:ratio with it's definition or possibly just by container_memory_working_set_bytes{node="${node_name}", container!=""}

aantn avatar May 17 '24 11:05 aantn

To help us get to the bottom of this, can each of you please verify that the metric instance:node_memory_utilisation:ratio is in fact missing from your environment.

aantn avatar May 17 '24 11:05 aantn

Yeah. I can confirm that we do not have instance:node_memory_utilisation:ratio. Only container_memory_working_set_bytes

wrbbz avatar May 17 '24 15:05 wrbbz

I confirm that we don't have that record.

So, I'll add the following record:

record: instance:node_memory_utilisation:ratio
expr: 1 - (node_memory_MemAvailable_bytes{job="node-exporter"} or (node_memory_Buffers_bytes{job="node-exporter"} + node_memory_Cached_bytes{job="node-exporter"} + node_memory_MemFree_bytes{job="node-exporter"} + node_memory_Slab_bytes{job="node-exporter"} ) / node_memory_MemTotal_bytes{job="node-exporter"}) 

Thank you!

Bobses avatar May 20 '24 09:05 Bobses

Yep, that will fix the problem. (Please confirm!)

I think we should also change this on our side to query the expr instead and not rely on that recording rule.

aantn avatar May 20 '24 09:05 aantn

I've created a PR on usage definitions instead of records

Also, adding record to the Prom instance solved No Data error

wrbbz avatar May 21 '24 10:05 wrbbz

@wrbbz thank you for the fix. It will be included in the next Robusta release!

aantn avatar Jun 15 '24 19:06 aantn