Kubelet dont start on nodes
Hello folks,
I need a help with runtime containerd running on worker nodes kubelet version 1.21.
Attention on image, i use from:
https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html
I have created my cluster with the file below:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: eks-xxx
region: us-east-1
version: "1.21"
tags:
managed: eksctl
owner: eks-xxx
vpc:
id: "vpc-xxxx"
securityGroup: "sg-xxx"
subnets:
private:
us-east-1a:
id: "subnet-xxxx"
us-east-1b:
id: "subnet-xxxx"
us-east-1c:
id: "subnet-xxxx"
public:
us-east-1a:
id: "subnet-yyyyy"
us-east-1b:
id: "subnet-yyyyy"
us-east-1c:
id: "subnet-yyyyy"
nodeGroups:
- name: gp01
volumeSize: 50
volumeType: gp2
ami: **ami-0193ebf9573ebc9f7**
overrideBootstrapCommand: |
#!/bin/bash
/etc/eks/bootstrap.sh eks-xxx --container-runtime containerd
privateNetworking: true
minSize: 1
maxSize: 2
instancesDistribution:
maxPrice: 0.095000
instanceTypes:
- "t4g.xlarge"
- "t3.xlarge"
- "t2.xlarge"
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotInstancePools: 4
securityGroups:
withShared: true
withLocal: true
iam:
attachPolicyARNs:
- arn:aws:iam::983054731403:policy/eks-xxx-xxx
withAddonPolicies:
ALBIngress: true
AutoScaler: true
CertManager: true
CloudWatch: true
EBS: true
EFS: true
ExternalDNS: true
ImageBuilder: true # ECR
XRay: true
ssh:
publicKeyPath: ./keys/eks-xxx
cloudWatch:
clusterLogging:
enableTypes: ["audit", "authenticator"]
Everything went well until the part of creating the node group, it creates the nodes but does not attach to the cluster, when logging into the node I received the following logs from kubelete.
Command to see logs:
sudo cat /var/log/messages
Logs:
Sep 9 20:39:33 ip-10-53-2-218 systemd: Started Session c320 of user root.
Sep 9 20:39:33 ip-10-53-2-218 containerd: time="2021-09-09T20:39:33.807930482Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{},XXX_unrecognized:[],}"
Sep 9 20:39:33 ip-10-53-2-218 containerd: time="2021-09-09T20:39:33.821960041Z" level=info msg="ImageUpdate event &ImageUpdate{Name:sha256:106a8e54d5eb3f70fcd1ed46255bdf232b3f169e89e68e13e4e67b25f59c1315,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
Sep 9 20:39:33 ip-10-53-2-218 containerd: time="2021-09-09T20:39:33.822587955Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1: resolving |#033[32m#033[0m--------------------------------------|
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: elapsed: 0.1 s total: 0.0 B (0.0 B/s)
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1: resolved |#033[32m++++++++++++++++++++++++++++++++++++++#033[0m|
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: index-sha256:1cb4ab85a3480446f9243178395e6bee7350f0d71296daeb6a9fdd221e23aea6: done |#033[32m++++++++++++++++++++++++++++++++++++++#033[0m|
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: manifest-sha256:234b8785dd78afc0fbb27edad009e7eb253e5685fb7387d4f0145f65c00873ac: done |#033[32m++++++++++++++++++++++++++++++++++++++#033[0m|
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: layer-sha256:41d8806bd3d23e1ffb7e9825fa56a0c2e851dfeeb405477ab1d6bc3a34bc0da2: done |#033[32m++++++++++++++++++++++++++++++++++++++#033[0m|
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: config-sha256:106a8e54d5eb3f70fcd1ed46255bdf232b3f169e89e68e13e4e67b25f59c1315: done |#033[32m++++++++++++++++++++++++++++++++++++++#033[0m|
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: elapsed: 0.2 s total: 0.0 B (0.0 B/s)
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: unpacking linux/amd64 sha256:1cb4ab85a3480446f9243178395e6bee7350f0d71296daeb6a9fdd221e23aea6...
Sep 9 20:39:33 ip-10-53-2-218 pull-sandbox-image.sh: done
Sep 9 20:39:33 ip-10-53-2-218 systemd: Started pull sandbox image defined in containerd config.toml.
Sep 9 20:39:33 ip-10-53-2-218 systemd: Failed to load environment files: No such file or directory
Sep 9 20:39:33 ip-10-53-2-218 systemd: kubelet.service failed to run 'start-pre' task: No such file or directory
Sep 9 20:39:33 ip-10-53-2-218 systemd: Failed to start Kubernetes Kubelet.
Sep 9 20:39:33 ip-10-53-2-218 systemd: Unit kubelet.service entered failed state.
Sep 9 20:39:33 ip-10-53-2-218 systemd: kubelet.service failed.
Sep 9 20:39:33 ip-10-53-2-218 systemd: Removed slice User Slice of root.
Sep 9 20:39:38 ip-10-53-2-218 systemd: kubelet.service holdoff time over, scheduling restart.
Sep 9 20:39:38 ip-10-53-2-218 systemd: Stopped Kubernetes Kubelet.
Sep 9 20:39:38 ip-10-53-2-218 systemd: Starting pull sandbox image defined in containerd config.toml...
Sep 9 20:39:39 ip-10-53-2-218 systemd: Created slice User Slice of root.
I tried to create the file containerd-config.toml empty for tests but other errors came.
Thanks all.
Hi,
Could you check status of sandbox-service, containerd and kubelet? Essentially
- systemctl status sandbox-image -l
- systemctl status containerd -l
- systemctl status kubelet -l
- Also check if there are any errors in /var/log/cloud-init-output.log
Thanks,
Hello @ravisinha0506, thanks for answer!!
systemctl status sandbox-image -l
$ sudo systemctl status sandbox-image -l
● sandbox-image.service - pull sandbox image defined in containerd config.toml
Loaded: loaded (/etc/systemd/system/sandbox-image.service; enabled; vendor preset: disabled)
Active: activating (start) since sex 2021-09-10 18:22:01 UTC; 247ms ago
Main PID: 8511 (bash)
Tasks: 2
Memory: 12.0M
CGroup: /system.slice/sandbox-image.service
├─8511 bash /etc/eks/containerd/pull-sandbox-image.sh
└─8545 /usr/bin/python2 -s /usr/bin/aws ecr get-login-password --region us-east-1
set 10 18:22:01 ip-10-53-2-43.ec2.internal systemd[1]: Starting pull sandbox image defined in containerd config.toml...
systemctl status containerd -l
$ sudo systemctl status containerd -l
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since sex 2021-09-10 18:16:19 UTC; 6min ago
Docs: https://containerd.io
Main PID: 3010 (containerd)
Tasks: 12
Memory: 70.1M
CGroup: /system.slice/containerd.service
└─3010 /usr/bin/containerd
set 10 18:22:27 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:27.202994749Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
set 10 18:22:33 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:33.262510241Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{},XXX_unrecognized:[],}"
set 10 18:22:33 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:33.273228096Z" level=info msg="ImageUpdate event &ImageUpdate{Name:sha256:106a8e54d5eb3f70fcd1ed46255bdf232b3f169e89e68e13e4e67b25f59c1315,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
set 10 18:22:33 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:33.273757170Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
set 10 18:22:39 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:39.888685985Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{},XXX_unrecognized:[],}"
set 10 18:22:39 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:39.895833426Z" level=info msg="ImageUpdate event &ImageUpdate{Name:sha256:106a8e54d5eb3f70fcd1ed46255bdf232b3f169e89e68e13e4e67b25f59c1315,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
set 10 18:22:39 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:39.896594971Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
set 10 18:22:47 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:47.481357165Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{},XXX_unrecognized:[],}"
set 10 18:22:47 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:47.486029158Z" level=info msg="ImageUpdate event &ImageUpdate{Name:sha256:106a8e54d5eb3f70fcd1ed46255bdf232b3f169e89e68e13e4e67b25f59c1315,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
set 10 18:22:47 ip-10-53-2-43.ec2.internal containerd[3010]: time="2021-09-10T18:22:47.487458911Z" level=info msg="ImageUpdate event &ImageUpdate{Name:602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"
systemctl status kubelet -l
$ sudo systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-eksctl.al2.conf, 10-kubelet-args.conf
Active: activating (auto-restart) (Result: resources) since sex 2021-09-10 18:24:54 UTC; 2s ago
Docs: https://github.com/kubernetes/kubernetes
set 10 18:24:54 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:24:54 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
set 10 18:24:54 ip-10-53-2-43.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
set 10 18:24:54 ip-10-53-2-43.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
set 10 18:24:54 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed.
cat /var/log/cloud-init-output.log
$ sudo cat /var/log/cloud-init-output.log
Cloud-init v. 19.3-44.amzn2 running 'init-local' at Fri, 10 Sep 2021 18:16:01 +0000. Up 12.41 seconds.
Cloud-init v. 19.3-44.amzn2 running 'init' at Fri, 10 Sep 2021 18:16:03 +0000. Up 14.18 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: | eth0 | True | 10.53.2.43 | 255.255.255.0 | global | 0a:33:ea:bb:98:59 |
ci-info: | eth0 | True | fe80::833:eaff:febb:9859/64 | . | link | 0a:33:ea:bb:98:59 |
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . |
ci-info: | lo | True | ::1/128 | . | host | . |
ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
ci-info: +++++++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++++++
ci-info: +-------+-----------------+-----------+-----------------+-----------+-------+
ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
ci-info: +-------+-----------------+-----------+-----------------+-----------+-------+
ci-info: | 0 | 0.0.0.0 | 10.53.2.1 | 0.0.0.0 | eth0 | UG |
ci-info: | 1 | 10.53.2.0 | 0.0.0.0 | 255.255.255.0 | eth0 | U |
ci-info: | 2 | 169.254.169.254 | 0.0.0.0 | 255.255.255.255 | eth0 | UH |
ci-info: +-------+-----------------+-----------+-----------------+-----------+-------+
ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | Route | Destination | Gateway | Interface | Flags |
ci-info: +-------+-------------+---------+-----------+-------+
ci-info: | 9 | fe80::/64 | :: | eth0 | U |
ci-info: | 11 | local | :: | eth0 | U |
ci-info: | 12 | multicast | :: | eth0 | U |
ci-info: +-------+-------------+---------+-----------+-------+
Cloud-init v. 19.3-44.amzn2 running 'modules:config' at Fri, 10 Sep 2021 18:16:05 +0000. Up 16.35 seconds.
Loaded plugins: priorities, update-motd, versionlock
No packages needed for security; 0 packages available
No packages marked for update
Cloud-init v. 19.3-44.amzn2 running 'modules:final' at Fri, 10 Sep 2021 18:16:12 +0000. Up 23.79 seconds.
Created symlink from /etc/systemd/system/multi-user.target.wants/containerd.service to /usr/lib/systemd/system/containerd.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/sandbox-image.service to /etc/systemd/system/sandbox-image.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service.
Job for kubelet.service failed because a configured resource limit was exceeded. See "systemctl status kubelet.service" and "journalctl -xe" for details.
Exited with error on line 447
Sep 10 18:16:24 cloud-init[2779]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
Sep 10 18:16:24 cloud-init[2779]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Sep 10 18:16:24 cloud-init[2779]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 19.3-44.amzn2 finished at Fri, 10 Sep 2021 18:16:24 +0000. Datasource DataSourceEc2. Up 35.27 seconds
sudo journalctl -u kubelet.service
$ sudo journalctl -u kubelet.service
-- Logs begin at sex 2021-09-10 18:15:50 UTC, end at sex 2021-09-10 18:35:30 UTC. --
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed.
set 10 18:16:29 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service holdoff time over, scheduling restart.
set 10 18:16:29 ip-10-53-2-43.ec2.internal systemd[1]: Stopped Kubernetes Kubelet.
set 10 18:16:30 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:16:30 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
set 10 18:16:30 ip-10-53-2-43.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
set 10 18:16:30 ip-10-53-2-43.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
set 10 18:16:30 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed.
set 10 18:16:35 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service holdoff time over, scheduling restart.
set 10 18:16:35 ip-10-53-2-43.ec2.internal systemd[1]: Stopped Kubernetes Kubelet.
set 10 18:16:36 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:16:36 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
set 10 18:16:36 ip-10-53-2-43.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
set 10 18:16:36 ip-10-53-2-43.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
set 10 18:16:36 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed.
set 10 18:16:42 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service holdoff time over, scheduling restart.
set 10 18:16:42 ip-10-53-2-43.ec2.internal systemd[1]: Stopped Kubernetes Kubelet.
set 10 18:16:43 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:16:43 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
set 10 18:16:43 ip-10-53-2-43.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
set 10 18:16:43 ip-10-53-2-43.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
set 10 18:16:43 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed.
set 10 18:16:48 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service holdoff time over, scheduling restart.
set 10 18:16:48 ip-10-53-2-43.ec2.internal systemd[1]: Stopped Kubernetes Kubelet.
set 10 18:16:49 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:16:49 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
set 10 18:16:49 ip-10-53-2-43.ec2.internal systemd[1]: Failed to start Kubernetes Kubelet.
set 10 18:16:49 ip-10-53-2-43.ec2.internal systemd[1]: Unit kubelet.service entered failed state.
set 10 18:16:49 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed.
I want to use x86 image, i asked for help at eksctl issues but follow the steps they said to me, eksctl uses the arm64 image default, if u want to see more information there:
https://github.com/weaveworks/eksctl/issues/4206
Thanks again!!!!!
@wh1sssss, Is this still an issue? Looks like kubelet is not coming up due to this:
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: Failed to load environment files: No such file or directory
set 10 18:16:24 ip-10-53-2-43.ec2.internal systemd[1]: kubelet.service failed to run 'start-pre' task: No such file or directory
If this is still an active issue, could you share the EKS ami on which this is happening?
@ravisinha0506 Hello, still an issue!
I use the image ami-0193ebf9573ebc9f7.
I need to use amd64 ec2 and CRI containerd on EKS 1.21 but i cant do it, now i use amd64 with docker shim.
Thankssss for answer.
@wh1sssss you have 3 different instance classes in your node group that might be causing you issues. I'd drop the t2 as it's not a nitro backed instance type and then for testing purposes either use a t3 (x86) or t4g (arm).