amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

kubelet - PLEG is not healthy - flapping between ready and notReady

Open dombisza opened this issue 1 year ago • 3 comments

What happened: kubelet is flapping between NodeReady and NodeNotReady, because of the following error:

"Skipping pod synchronization" err="PLEG is not healthy: pleg was last seen active 3m11.637412161s ago; threshold is 3m0s"

There is no high load or resource usage on the node, also docker is responsive:

 07:31:31 up 8 days, 15:01,  1 user,  load average: 0.76, 0.77, 0.73
time docker ps
real	0m0.040s
user	0m0.022s
sys	0m0.016s

After restarting dockerd the flapping stops, but without that the issue keeps happening. I have checked kubelet and docker logs, but there is nothing which would suggest the cause of the issue.

Container runtime versions:

$ containerd -v
containerd github.com/containerd/containerd 1.6.6 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1

$ runc -v
runc version 1.1.4
commit: 5fd4c4d144137e991c4acebb2146ab1483a97925
spec: 1.0.2-dev
go: go1.18.6
libseccomp: 2.4.1

$ docker -v
Docker version 20.10.17, build 100c701

Note: this is on govcloud so our ami was made FIPS ready, which can also play into the issue.

What you expected to happen:

I would expect PLEG and dockerd to recover from the error without restarting dockerd.

How to reproduce it (as minimally and precisely as possible):

I did not find any way to reproduce it yet, it occurs randomly it seems.

Anything else we need to know?:

Environment:

  • AWS Region: us-gov-west-1
  • Instance Type(s): r5.xlarge
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): "eks.10"
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): "1.22"
  • AMI Version: v20230217
  • Kernel (e.g. uname -a): 5.4.228-132.418.amzn2.x86_64
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-0b23a4a7e969b46f0"
BUILD_TIME="Fri Feb 17 21:59:24 UTC 2023"
BUILD_KERNEL="5.4.228-132.418.amzn2.x86_64"
ARCH="x86_64"

Any help is appreciated.

dombisza avatar Mar 24 '23 07:03 dombisza