troubleshoot
troubleshoot copied to clipboard
surface any recent OOMkilled pods
Describe the rationale for the suggested feature.
from talking a little bit with @chris-sanders I think the intent of what we want to do is be able to surface any recent OOMkilled events in a pod’s history - for instance, if the restic pods die during a backup due to running out of memory, which we’ve seen before
maybe we can use the events analyzer suggested by Diamon #911 to surface any OOMkilled events to the support bundle user
saw some OOMKilled while working on https://github.com/replicatedhq/kots/pull/4031 related to version bump in troubleshoot. The kotsadm pod's history had a restart and those event analyzers would've saved some time
kotsadm:
Container ID: containerd://41bbe4aa9c2b520813a426a2179828d35bffc20f98a565bc57e691668b173570
Image: ttl.sh/automated-6110233908/kotsadm:24h
Image ID: ttl.sh/automated-6110233908/kotsadm@sha256:ee9a7d1405496c5253056ee06a028ebd200e6928058d24ee2404bdcb840b3b3c
Port: 3000/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 08 Sep 2023 00:57:11 +0530
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 08 Sep 2023 00:53:31 +0530
Finished: Fri, 08 Sep 2023 00:57:10 +0530
Ready: True
Restart Count: 1
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 100m
memory: 100Mi
which wasn't apparent upfront.
https://app.shortcut.com/replicated/story/109475/add-oomkilled-to-analyzers-in-default-spec