troubleshoot icon indicating copy to clipboard operation
troubleshoot copied to clipboard

surface any recent OOMkilled pods

Open adamancini opened this issue 2 years ago • 3 comments

Describe the rationale for the suggested feature.

from talking a little bit with @chris-sanders I think the intent of what we want to do is be able to surface any recent OOMkilled events in a pod’s history - for instance, if the restic pods die during a backup due to running out of memory, which we’ve seen before

adamancini avatar Aug 16 '23 21:08 adamancini

maybe we can use the events analyzer suggested by Diamon #911 to surface any OOMkilled events to the support bundle user

adamancini avatar Aug 16 '23 21:08 adamancini

saw some OOMKilled while working on https://github.com/replicatedhq/kots/pull/4031 related to version bump in troubleshoot. The kotsadm pod's history had a restart and those event analyzers would've saved some time

  kotsadm:
    Container ID:   containerd://41bbe4aa9c2b520813a426a2179828d35bffc20f98a565bc57e691668b173570
    Image:          ttl.sh/automated-6110233908/kotsadm:24h
    Image ID:       ttl.sh/automated-6110233908/kotsadm@sha256:ee9a7d1405496c5253056ee06a028ebd200e6928058d24ee2404bdcb840b3b3c
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 08 Sep 2023 00:57:11 +0530
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 08 Sep 2023 00:53:31 +0530
      Finished:     Fri, 08 Sep 2023 00:57:10 +0530
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:      100m
      memory:   100Mi

which wasn't apparent upfront.

arcolife avatar Sep 08 '23 15:09 arcolife

https://app.shortcut.com/replicated/story/109475/add-oomkilled-to-analyzers-in-default-spec

xavpaice avatar Jul 29 '24 03:07 xavpaice