robusta icon indicating copy to clipboard operation
robusta copied to clipboard

Accurately report reason for pod termination

Open malikal-hh opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Yes, the current code is reporting all pod terminations as OOMKilled, even when it’s because I do rolling deployment

Describe the solution you'd like I would like the code to be updated so that it accurately reports the reason for pod termination, and not just report all terminations as OOMKilled.

Describe alternatives you've considered I have not considered any alternative solutions at this time.

Additional context The pod didn't get OOMKilled image

malikal-hh avatar Sep 06 '23 08:09 malikal-hh

Hi @malikal-hh, thank you for reporting the issue. Our team is looking into it. Please feel free to join Robusta Community on Slack to discuss your queries.

pavangudiwada avatar Sep 07 '23 16:09 pavangudiwada

same here. is there a fix for that?

qxmips avatar Apr 24 '24 20:04 qxmips

hi @qxmips

Do you have the same issue? What container is the notification for? What is it's memory request and limits? Can you also share a memory graph of it around the time of the notification?

arikalon1 avatar Apr 24 '24 22:04 arikalon1

image image robusta runner logs:

<html>
<body>
<!--StartFragment-->
2024-04-24 22:30:43.924 | HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\"loki\\" in pod \\"loki-write-0\\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}\n' |  
-- | -- | --
  |   | 2024-04-24 22:30:43.924 | HTTP response headers: HTTPHeaderDict({'Audit-Id': '19de0f28-3167-44ab-b41e-c3adfb350982', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 24 Apr 2024 19:30:43 GMT', 'Content-Length': '196'}) |  
  |   | 2024-04-24 22:30:43.924 | Reason: Bad Request |  
  |   | 2024-04-24 22:30:43.924 | kubernetes.client.exceptions.ApiException: (400) |  
  |   | 2024-04-24 22:30:43.924 | raise ApiException(http_resp=r) |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 235, in request |  
  |   | 2024-04-24 22:30:43.924 | return self.request("GET", url, |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 241, in GET |  
  |   | 2024-04-24 22:30:43.924 | return self.rest_client.GET(url, |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request |  
  |   | 2024-04-24 22:30:43.924 | response_data = self.request( |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api |  
  |   | 2024-04-24 22:30:43.924 | return self.__call_api(resource_path, method, |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api |  
  |   | 2024-04-24 22:30:43.924 | return self.api_client.call_api( |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 23866, in read_namespaced_pod_log_with_http_info |  
  |   | 2024-04-24 22:30:43.924 | return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501 |  
  |   | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log |  
  |   | 2024-04-24 22:30:43.924 | resp = core_v1.read_namespaced_pod_log( |  
  |   | 2024-04-24 22:30:43.924 | File "/app/src/robusta/integrations/kubernetes/api_client_utils.py", line 145, in get_pod_logs |  
  |   | 2024-04-24 22:30:43.924 | Traceback (most recent call last): |  
  |   | 2024-04-24 22:30:43.924 | 2024-04-24 19:30:43.918 ERROR    failed to get pod logs loki-write-0 observability loki

<!--EndFragment-->
</body>
</html>

qxmips avatar Apr 25 '24 19:04 qxmips

thanks @qxmips

Does this happen over and over again? (pod is crashing and you get oom kill notification) Do you know how to reproduce it easily?

arikalon1 avatar Apr 26 '24 07:04 arikalon1