talos icon indicating copy to clipboard operation
talos copied to clipboard

Kubernetes and talos services crashing under memory pressure

Open maxheyer opened this issue 2 years ago • 2 comments

Bug Report

Description

One of our worker nodes crashes rarely. Both kubelet and apid. Since apid also crashes, we have not yet been able to collect any logs. The problem is solved by restarting the node.

Logs

Not able to receive any yet, but the node get's under DiskPressure and MemoryPressure. We are in the process of implementing some form of log collection and will provide logs asap.

Environment

  • Talos version: v1.5.4
  • Kubernetes version: v1.28.3
  • Platform: QEMU KVM / Proxmox

maxheyer avatar Jan 01 '24 01:01 maxheyer


Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Wed, 06 Dec 2023 02:47:10 +0100   Wed, 06 Dec 2023 02:47:10 +0100   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletReady                 kubelet is posting ready status



Node conditions on crash.

maxheyer avatar Jan 09 '24 13:01 maxheyer

Talos services has some cgroup reservation, so it'd be nice to see the logs around the crash, as it might be something else.

btw the conditions look good

smira avatar Jan 11 '24 12:01 smira