robusta icon indicating copy to clipboard operation
robusta copied to clipboard

add the ability to get pods list on k8s events that related to nodes

Open devopsmash opened this issue 1 year ago • 5 comments

It will be amazing if you will be able to add the ability to get the pods list on k8s events that related to nodes

We have tried to add the node_running_pods_enricher action to on_kubernetes_warning_event, see here:

  - triggers:
      - on_kubernetes_warning_event: {}
    actions:
      - event_report: {}
      - event_resource_events: {}
      - node_running_pods_enricher: {}

But, we are getting the following error from the runner

2023-01-08 09:10:57.603 ERROR    Action node_running_pods_enricher cannot be triggered by <class 'robusta.integrations.kubernetes.autogenerated.events.EventChangeEvent'>
2023-01-08 09:10:57.603 ERROR    unknown error reloading playbooks. will try again when they next change
Traceback (most recent call last):
  File "/app/src/robusta/runner/config_loader.py", line 241, in __reload_playbook_packages
    (sinks_registry, playbooks_registry) = self.__prepare_runtime_config(
  File "/app/src/robusta/runner/config_loader.py", line 308, in __prepare_runtime_config
    playbooks_registry = PlaybooksRegistryImpl(
  File "/app/src/robusta/model/config.py", line 151, in __init__
    raise Exception(msg)
Exception: Action node_running_pods_enricher cannot be triggered by <class 'robusta.integrations.kubernetes.autogenerated.events.EventChangeEvent'>
\

Any chance to implement that kind of feature?

devopsmash avatar Jan 08 '23 09:01 devopsmash

Thanks for reporting it @dsaydon90

Just to clarify, you'd like to get the list of pods running on the node, whenever there's a warning event on some node? Or is there another scenario for which you'd like to get the pods list?

arikalon1 avatar Jan 08 '23 09:01 arikalon1

Hi @arikalon1 , thank you for your response.

Exactly, whenever there's a warning event on some node, to get the running pods on that node. A good example can be something like: InvalidDiskCapacity Warning for Node None/ip-10-104-70-47.eu-west-1.compute.internal

image

In this case, it would be nice to have a list of pods running on the node at the same time, so I'll be able to take each pod for further investigation.

devopsmash avatar Jan 08 '23 10:01 devopsmash

Thanks for the explanation @dsaydon90

Did you consider doing it using a prometheus alert on the node? (For example, creating an alert for unhealthy node, and adding the list of pods and node events to it) I'm asking, because warning events are often temporary. When a node starts, or scales down, the are often warning events.

arikalon1 avatar Jan 08 '23 11:01 arikalon1

Hi @arikalon1 , In my case, we are using newrelic, that's why we are trying to avoid from installing Prometheus.

devopsmash avatar Jan 08 '23 21:01 devopsmash

Hey @dsaydon90 , we'd like to learn more about the issue before we start development. Can we meet for a short call? You can message me (Tomer) in the Robusta Slack Community

Does it work for you?

Sheeproid avatar Jan 12 '23 15:01 Sheeproid