Accurately report reason for pod termination
Is your feature request related to a problem? Yes, the current code is reporting all pod terminations as OOMKilled, even when it’s because I do rolling deployment
Describe the solution you'd like I would like the code to be updated so that it accurately reports the reason for pod termination, and not just report all terminations as OOMKilled.
Describe alternatives you've considered I have not considered any alternative solutions at this time.
Additional context
The pod didn't get OOMKilled
Hi @malikal-hh, thank you for reporting the issue. Our team is looking into it. Please feel free to join Robusta Community on Slack to discuss your queries.
same here. is there a fix for that?
hi @qxmips
Do you have the same issue? What container is the notification for? What is it's memory request and limits? Can you also share a memory graph of it around the time of the notification?
robusta runner logs:
<html>
<body>
<!--StartFragment-->
2024-04-24 22:30:43.924 | HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"container \\"loki\\" in pod \\"loki-write-0\\" is waiting to start: ContainerCreating","reason":"BadRequest","code":400}\n' |
-- | -- | --
| | 2024-04-24 22:30:43.924 | HTTP response headers: HTTPHeaderDict({'Audit-Id': '19de0f28-3167-44ab-b41e-c3adfb350982', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 24 Apr 2024 19:30:43 GMT', 'Content-Length': '196'}) |
| | 2024-04-24 22:30:43.924 | Reason: Bad Request |
| | 2024-04-24 22:30:43.924 | kubernetes.client.exceptions.ApiException: (400) |
| | 2024-04-24 22:30:43.924 | raise ApiException(http_resp=r) |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 235, in request |
| | 2024-04-24 22:30:43.924 | return self.request("GET", url, |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 241, in GET |
| | 2024-04-24 22:30:43.924 | return self.rest_client.GET(url, |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request |
| | 2024-04-24 22:30:43.924 | response_data = self.request( |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api |
| | 2024-04-24 22:30:43.924 | return self.__call_api(resource_path, method, |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api |
| | 2024-04-24 22:30:43.924 | return self.api_client.call_api( |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 23866, in read_namespaced_pod_log_with_http_info |
| | 2024-04-24 22:30:43.924 | return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs) # noqa: E501 |
| | 2024-04-24 22:30:43.924 | File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 23747, in read_namespaced_pod_log |
| | 2024-04-24 22:30:43.924 | resp = core_v1.read_namespaced_pod_log( |
| | 2024-04-24 22:30:43.924 | File "/app/src/robusta/integrations/kubernetes/api_client_utils.py", line 145, in get_pod_logs |
| | 2024-04-24 22:30:43.924 | Traceback (most recent call last): |
| | 2024-04-24 22:30:43.924 | 2024-04-24 19:30:43.918 ERROR failed to get pod logs loki-write-0 observability loki
<!--EndFragment-->
</body>
</html>
thanks @qxmips
Does this happen over and over again? (pod is crashing and you get oom kill notification) Do you know how to reproduce it easily?