eksctl
eksctl copied to clipboard
[Bug] Taints not added to unmanaged nodegroups
What were you trying to accomplish?
Trying to get the node taints to be applied to the Nodegroup's nodes when creating a cluster with eksctl
from a given config file.
What happened?
No error messages, but the nodes don't have their respective taints applied as it is specified in the eksctl config file. Here's an example of a kubectl describe node <node-with-taints>
:
Name: ip-192-168-40-218.ec2.internal
Roles: <none>
Labels: alpha.eksctl.io/cluster-name=test-101
alpha.eksctl.io/instance-id=i-07b21d711244c4aa7
alpha.eksctl.io/nodegroup-name=cx-ws-spot
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1b
k8s.io/cloud-provider-aws=41de372bac448d47ffa9f20aea78c914
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-192-168-40-218.ec2.internal
kubernetes.io/os=linux
lifecycle=Ec2Spot
node-lifecycle=spot
node.kubernetes.io/instance-type=t3.medium
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/zone=us-east-1b
workload=true
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 06 Aug 2022 15:44:24 +0000
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: ip-192-168-40-218.ec2.internal
AcquireTime: <unset>
RenewTime: Sat, 06 Aug 2022 16:02:36 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sat, 06 Aug 2022 15:58:09 +0000 Sat, 06 Aug 2022 15:44:24 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 06 Aug 2022 15:58:09 +0000 Sat, 06 Aug 2022 15:44:24 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 06 Aug 2022 15:58:09 +0000 Sat, 06 Aug 2022 15:44:24 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sat, 06 Aug 2022 15:58:09 +0000 Sat, 06 Aug 2022 15:45:05 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.40.218
ExternalIP: 54.173.68.41
Hostname: ip-192-168-40-218.ec2.internal
InternalDNS: ip-192-168-40-218.ec2.internal
ExternalDNS: ec2-54-173-68-41.compute-1.amazonaws.com
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 52416492Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3965408Ki
pods: 17
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 1930m
ephemeral-storage: 47233297124
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3410400Ki
pods: 17
System Info:
Machine ID: ec2ea9682a389d7dfd4c14818a566165
System UUID: ec2ea968-2a38-9d7d-fd4c-14818a566165
Boot ID: d5b503c2-db88-454b-90d8-5cb7e6036e0b
Kernel Version: 5.4.196-108.356.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.13
Kubelet Version: v1.22.9-eks-810597c
Kube-Proxy Version: v1.22.9-eks-810597c
ProviderID: aws:///us-east-1b/i-07b21d711244c4aa7
Non-terminated Pods: (10 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default autoscaler-c74855657-jqsl5 100m (5%) 0 (0%) 100Mi (3%) 0 (0%) 16m
default operator-controller-manager-6c86f7585b-4qx54 100m (5%) 300m (15%) 80Mi (2%) 100Mi (3%) 14m
istio-system ingressgateway-operator-67f6dfdb6-z8m89 100m (5%) 1 (51%) 128Mi (3%) 1Gi (30%) 16m
kube-system aws-node-868jc 25m (1%) 0 (0%) 0 (0%) 0 (0%) 18m
kube-system cluster-autoscaler-c5894b4bf-hqdzj 100m (5%) 300m (15%) 400Mi (12%) 0 (0%) 16m
kube-system k8s-neuron-scheduler-669cb77c6d-4g4hv 50m (2%) 0 (0%) 100Mi (3%) 0 (0%) 15m
kube-system kube-proxy-8nm6v 100m (5%) 0 (0%) 0 (0%) 0 (0%) 17m
logging event-exporter-54f465679d-4jqwq 20m (1%) 0 (0%) 50Mi (1%) 0 (0%) 16m
logging fluent-bit-gs5rv 100m (5%) 150m (7%) 150Mi (4%) 150Mi (4%) 16m
prometheus node-exporter-9tbkl 50m (2%) 270m (13%) 200Mi (6%) 220Mi (6%) 15m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 745m (38%) 2020m (104%)
memory 1208Mi (36%) 1494Mi (44%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 17m kube-proxy
Normal Starting 17m kube-proxy
How to reproduce it?
The following eksctl config file was used:
addons:
- name: vpc-cni
version: 1.11.0
apiVersion: eksctl.io/v1alpha5
availabilityZones:
- us-east-1a
- us-east-1b
kind: ClusterConfig
metadata:
name: test-101
region: us-east-1
tags:
cortex.dev/cluster-name: test-101
version: '1.22'
nodeGroups:
- ami: ami-0d5cbb67678bc879c
desiredCapacity: 2
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::<account-id>:policy/cortex-test-101-us-east-1
- arn:aws:iam::aws:policy/AmazonS3FullAccess
withAddonPolicies:
autoScaler: true
instanceType: t3.medium
kubeletExtraConfig:
evictionHard:
memory.available: 200Mi
nodefs.available: 5%
kubeReserved:
cpu: 150m
ephemeral-storage: 1Gi
memory: 300Mi
kubeReservedCgroup: /kube-reserved
registryPullQPS: 10
systemReserved:
cpu: 150m
ephemeral-storage: 1Gi
memory: 300Mi
labels:
operator: 'true'
maxSize: 25
minSize: 2
name: cx-operator
overrideBootstrapCommand: '#!/bin/bash
source /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
/etc/eks/bootstrap.sh test-101 --container-runtime dockerd --kubelet-extra-args
"--node-labels=${NODE_LABELS}"'
preBootstrapCommands:
- sudo yum install -y ipvsadm
- sudo modprobe ip_vs
- sudo modprobe ip_vs_rr
- sudo modprobe ip_vs_lc
- sudo modprobe ip_vs_wrr
- sudo modprobe ip_vs_sh
- sudo modprobe nf_conntrack_ipv4
privateNetworking: false
volumeIOPS: 3000
volumeSize: 20
volumeThroughput: 125
volumeType: gp3
- ami: ami-0d5cbb67678bc879c
desiredCapacity: 1
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::<account-id>:policy/cortex-test-101-us-east-1
- arn:aws:iam::aws:policy/AmazonS3FullAccess
withAddonPolicies:
autoScaler: true
instanceType: t3.medium
kubeletExtraConfig:
evictionHard:
memory.available: 200Mi
nodefs.available: 5%
kubeReserved:
cpu: 150m
ephemeral-storage: 1Gi
memory: 300Mi
kubeReservedCgroup: /kube-reserved
registryPullQPS: 10
systemReserved:
cpu: 150m
ephemeral-storage: 1Gi
memory: 300Mi
labels:
prometheus: 'true'
maxSize: 1
minSize: 1
name: cx-prometheus
overrideBootstrapCommand: '#!/bin/bash
source /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
/etc/eks/bootstrap.sh test-101 --container-runtime dockerd --kubelet-extra-args
"--node-labels=${NODE_LABELS}"'
preBootstrapCommands:
- sudo yum install -y ipvsadm
- sudo modprobe ip_vs
- sudo modprobe ip_vs_rr
- sudo modprobe ip_vs_lc
- sudo modprobe ip_vs_wrr
- sudo modprobe ip_vs_sh
- sudo modprobe nf_conntrack_ipv4
privateNetworking: false
taints:
- effect: NoSchedule
key: prometheus
value: 'true'
volumeIOPS: 3000
volumeSize: 20
volumeThroughput: 125
volumeType: gp3
- ami: ami-0d5cbb67678bc879c
asgSuspendProcesses:
- AZRebalance
desiredCapacity: 1
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
- arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess
- arn:aws:iam::<account-id>:policy/cortex-test-101-us-east-1
- arn:aws:iam::aws:policy/AmazonS3FullAccess
withAddonPolicies:
autoScaler: true
instanceType: mixed
instancesDistribution:
instanceTypes:
- t3.medium
maxPrice: 0.0416
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotInstancePools: 1
kubeletExtraConfig:
evictionHard:
memory.available: 200Mi
nodefs.available: 5%
kubeReserved:
cpu: 150m
ephemeral-storage: 1Gi
memory: 300Mi
kubeReservedCgroup: /kube-reserved
registryPullQPS: 10
systemReserved:
cpu: 150m
ephemeral-storage: 1Gi
memory: 300Mi
labels:
lifecycle: Ec2Spot
workload: 'true'
maxSize: 16
minSize: 1
name: cx-ws-spot
overrideBootstrapCommand: '#!/bin/bash
source /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
/etc/eks/bootstrap.sh test-101 --container-runtime dockerd --kubelet-extra-args
"--node-labels=${NODE_LABELS}"'
preBootstrapCommands:
- sudo yum install -y ipvsadm
- sudo modprobe ip_vs
- sudo modprobe ip_vs_rr
- sudo modprobe ip_vs_lc
- sudo modprobe ip_vs_wrr
- sudo modprobe ip_vs_sh
- sudo modprobe nf_conntrack_ipv4
privateNetworking: false
tags:
k8s.io/cluster-autoscaler/enabled: 'true'
k8s.io/cluster-autoscaler/node-template/label/workload: 'true'
taints:
- effect: NoSchedule
key: workload
value: 'true'
volumeIOPS: 3000
volumeSize: 50
volumeThroughput: 125
volumeType: gp3
vpc:
nat:
gateway: Disable
Logs
The following is the output of eksctl create
:
2022-08-06 15:22:51 [ℹ] eksctl version 0.107.0
2022-08-06 15:22:51 [ℹ] using region us-east-1
2022-08-06 15:22:51 [ℹ] subnets for us-east-1a - public:192.168.0.0/19 private:192.168.64.0/19
2022-08-06 15:22:51 [ℹ] subnets for us-east-1b - public:192.168.32.0/19 private:192.168.96.0/19
2022-08-06 15:22:51 [ℹ] nodegroup "cx-operator" will use "ami-0d5cbb67678bc879c" [AmazonLinux2/1.22]
2022-08-06 15:22:52 [ℹ] nodegroup "cx-prometheus" will use "ami-0d5cbb67678bc879c" [AmazonLinux2/1.22]
2022-08-06 15:22:52 [ℹ] nodegroup "cx-ws-spot" will use "ami-0d5cbb67678bc879c" [AmazonLinux2/1.22]
2022-08-06 15:22:52 [ℹ] using Kubernetes version 1.22
2022-08-06 15:22:52 [ℹ] creating EKS cluster "test-101" in "us-east-1" region with un-managed nodes
2022-08-06 15:22:52 [ℹ] 3 nodegroups (cx-operator, cx-prometheus, cx-ws-spot) were included (based on the include/exclude rules)
2022-08-06 15:22:52 [ℹ] will create a CloudFormation stack for cluster itself and 3 nodegroup stack(s)
2022-08-06 15:22:52 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
2022-08-06 15:22:52 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=test-101'
2022-08-06 15:22:52 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "test-101" in "us-east-1"
2022-08-06 15:22:52 [ℹ] CloudWatch logging will not be enabled for cluster "test-101" in "us-east-1"
2022-08-06 15:22:52 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-1 --cluster=test-101'
2022-08-06 15:22:52 [ℹ]
2 sequential tasks: { create cluster control plane "test-101",
2 sequential sub-tasks: {
2 sequential sub-tasks: {
wait for control plane to become ready,
1 task: { create addons },
},
3 parallel sub-tasks: {
create nodegroup "cx-operator",
create nodegroup "cx-prometheus",
create nodegroup "cx-ws-spot",
},
}
}
2022-08-06 15:22:52 [ℹ] building cluster stack "eksctl-test-101-cluster"
2022-08-06 15:22:53 [ℹ] deploying stack "eksctl-test-101-cluster"
2022-08-06 15:23:23 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:23:53 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:24:53 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:25:54 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:26:54 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:27:55 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:28:55 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:29:56 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:30:56 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:31:56 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:32:57 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:33:57 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:34:58 [ℹ] waiting for CloudFormation stack "eksctl-test-101-cluster"
2022-08-06 15:37:02 [!] OIDC is disabled but policies are required/specified for this addon. Users are responsible for attaching the policies to all nodegroup roles
2022-08-06 15:37:02 [ℹ] creating addon
2022-08-06 15:37:13 [ℹ] addon "vpc-cni" active
2022-08-06 15:37:13 [ℹ] building nodegroup stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:37:13 [ℹ] building nodegroup stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:37:13 [ℹ] building nodegroup stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:37:14 [ℹ] deploying stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:37:14 [ℹ] deploying stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:37:14 [ℹ] deploying stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:37:14 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:37:14 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:37:14 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:37:44 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:37:44 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:37:44 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:38:17 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:38:35 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:38:40 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:39:24 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:39:41 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:40:11 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:40:25 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:40:25 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:40:43 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-prometheus"
2022-08-06 15:42:17 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-ws-spot"
2022-08-06 15:42:20 [ℹ] waiting for CloudFormation stack "eksctl-test-101-nodegroup-cx-operator"
2022-08-06 15:42:20 [ℹ] waiting for the control plane availability...
2022-08-06 15:42:20 [✔] saved kubeconfig as "/root/.kube/config"
2022-08-06 15:42:20 [ℹ] 1 task: { suspend ASG processes for nodegroup cx-ws-spot }
2022-08-06 15:42:21 [ℹ] suspended ASG processes [AZRebalance] for cx-ws-spot
2022-08-06 15:42:21 [✔] all EKS cluster resources for "test-101" have been created
2022-08-06 15:42:21 [ℹ] adding identity "arn:aws:iam::<account-id>:role/eksctl-test-101-nodegroup-cx-oper-NodeInstanceRole-CL6PDF5V5H8A" to auth ConfigMap
2022-08-06 15:42:22 [ℹ] nodegroup "cx-operator" has 0 node(s)
2022-08-06 15:42:22 [ℹ] waiting for at least 2 node(s) to become ready in "cx-operator"
2022-08-06 15:43:23 [ℹ] nodegroup "cx-operator" has 2 node(s)
2022-08-06 15:43:23 [ℹ] node "ip-192-168-26-230.ec2.internal" is ready
2022-08-06 15:43:23 [ℹ] node "ip-192-168-50-88.ec2.internal" is ready
2022-08-06 15:43:23 [ℹ] adding identity "arn:aws:iam::<account-id>:role/eksctl-test-101-nodegroup-cx-prom-NodeInstanceRole-Q6E7ZFBJJMG5" to auth ConfigMap
2022-08-06 15:43:23 [ℹ] nodegroup "cx-prometheus" has 0 node(s)
2022-08-06 15:43:23 [ℹ] waiting for at least 1 node(s) to become ready in "cx-prometheus"
2022-08-06 15:44:19 [ℹ] nodegroup "cx-prometheus" has 1 node(s)
2022-08-06 15:44:19 [ℹ] node "ip-192-168-59-32.ec2.internal" is ready
2022-08-06 15:44:19 [ℹ] adding identity "arn:aws:iam::<account-id>:role/eksctl-test-101-nodegroup-cx-ws-s-NodeInstanceRole-1SRRMCVPPI6Q9" to auth ConfigMap
2022-08-06 15:44:20 [ℹ] nodegroup "cx-ws-spot" has 0 node(s)
2022-08-06 15:44:20 [ℹ] waiting for at least 1 node(s) to become ready in "cx-ws-spot"
2022-08-06 15:45:05 [ℹ] nodegroup "cx-ws-spot" has 1 node(s)
2022-08-06 15:45:05 [ℹ] node "ip-192-168-40-218.ec2.internal" is ready
2022-08-06 15:45:06 [ℹ] kubectl command should work with "/root/.kube/config", try 'kubectl get nodes'
2022-08-06 15:45:06 [✔] EKS cluster "test-101" in "us-east-1" region is ready
Anything else we need to know?
Deploying this using Ubuntu. The AMI images have to be locked to prevent unexpected errors in the future - we've had this happen in the past. The EKS AMI images were last updated on 5th of July 2022.
Versions
The following is the output of the eksctl info
command:
eksctl version: 0.107.0
kubectl version: v1.23.6
OS: linux
overrideBootstrapCommand: '#!/bin/bash
source /var/lib/cloud/scripts/eksctl/bootstrap.helper.sh
/etc/eks/bootstrap.sh test-101 --container-runtime dockerd --kubelet-extra-args
"--node-labels=${NODE_LABELS}"'
When you specify overrideBootstrapCommand
, you're responsible for passing any taints to --kubelet-extra-args
, just like you passed labels for the node.
/etc/eks/bootstrap.sh test-101 --container-runtime dockerd --kubelet-extra-args "--node-labels=${NODE_LABELS} --register-with-taints=${NODE_TAINTS}"
Alternatively,
/etc/eks/bootstrap.sh test-101 --container-runtime dockerd --kubelet-extra-args "${KUBELET_EXTRA_ARGS}"
We'll look into updating the documentation and examples to make this more clear. I'm removing the bug label as this is the expected behaviour.
@cPu1 I see. That's good intel. I'm gonna give this a try then and report the results. And yeah, updating the documentation and the examples would definitely help with this use case in the future. Thanks a lot!
@cPu1 so it did work. Thanks a lot!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
@cPu1 so it did work. Thanks a lot!
No worries!