robusta
robusta copied to clipboard
Logs enricher when use prometheus does not show the graph for node utilisation
Describe the bug
Hey, I am using the Robusta some days ago with Prometheus plus Alertmanager but the enricher for prometheus like memory or cpu is empty when fired to teams as alert. Only the graph for memory usage appears in the graph but the node utilisation is not showing.
The image below will demonstrate it
To Reproduce Steps to reproduce the behavior: 1 - Install using the official helm charts 3 - Configure Prometheus and Alertmanager 4 - Configure the SINK to use Teams 5 - In my global env has been configured like below
globalConfig:
grafana_url: ""
grafana_api_key: ""
grafana_dashboard_uid: ""
alertmanager_url: "http://alertmanager-operated.thanos:9093"
prometheus_url: "http://thanos-query-frontend.thanos:9090"
signing_key: ""
account_id: 695c3053-0e56-xxxxxxxxxxxxxxxxxxxxxx
custom_annotations: []
5 - My trigger and action is:
- triggers:
- on_pod_oom_killed:
rate_limit: 3600
actions:
- pod_oom_killer_enricher: {}
- logs_enricher: {}
- pod_node_graph_enricher:
resource_type: Memory
display_limits: true
- oomkilled_container_graph_enricher:
resource_type: Memory
display_limits: true
stop: true
Expected behavior
The graph woks for both sides, node utilisation and pod utilisation
Screenshots It was added above
Desktop (please complete the following information):
- OS: RedHat 8.5 and Ubunut 20.04LTS
- Browser: Chrome
- Version: 119
Additional context Add any other context about the problem here.
Yes I to receive the node graph empty.
Same here on Robusta 0.12.0 without UI integration
Same here on Robusta 0.12.0 without UI integration
Same here.
@Bobses @wrbbz do you see any exception in the robusta-runner pod logs ?
Hi all, I believe this is because robusta is using the recording rule instance:node_memory_utilisation:ratio
which isn't present in your environment.
If that is the case, we should be able to fix this by replacing instance:node_memory_utilisation:ratio
with it's definition or possibly just by
container_memory_working_set_bytes{node="${node_name}", container!=""}
To help us get to the bottom of this, can each of you please verify that the metric instance:node_memory_utilisation:ratio
is in fact missing from your environment.
Yeah. I can confirm that we do not have instance:node_memory_utilisation:ratio
. Only container_memory_working_set_bytes
I confirm that we don't have that record.
So, I'll add the following record:
record: instance:node_memory_utilisation:ratio
expr: 1 - (node_memory_MemAvailable_bytes{job="node-exporter"} or (node_memory_Buffers_bytes{job="node-exporter"} + node_memory_Cached_bytes{job="node-exporter"} + node_memory_MemFree_bytes{job="node-exporter"} + node_memory_Slab_bytes{job="node-exporter"} ) / node_memory_MemTotal_bytes{job="node-exporter"})
Thank you!
Yep, that will fix the problem. (Please confirm!)
I think we should also change this on our side to query the expr instead and not rely on that recording rule.
I've created a PR on usage definitions instead of records
Also, adding record to the Prom instance solved No Data error
@wrbbz thank you for the fix. It will be included in the next Robusta release!