eks-anywhere
eks-anywhere copied to clipboard
Kubernetes API server does not start if DHCP assigns new IP addresses diff from originals
What happened:
DHCP leases expired for cluster and assigned new IP addresses, Kubernetes API server would not start after, error found in journal was kubelet.go:2451] "Error getting node" err="node \"192.168.3.192\" not found"
What you expected to happen: Cluster to detect new IP addresses assigned by DHCP and adjust accordingly so API server and other services can come up
How to reproduce it (as minimally and precisely as possible): Create a new EKSA cluster, shut it down, delete DHCP leases so NEW IP addresses get assigned, restart cluster
Anything else we need to know?:
NODE_IP is hard-coded with original DHCP IP address in the env
file located under /etc/kubernetes/kubelet
, by statically assigning the original address via DHCP, I was able to get the cluster working again.
Environment:
- EKS Anywhere Release: v0.10.1
- EKS Distro Release: 1.22
hey @echel0n, I'm sorry to hear you ran into this issue. Did this impact worker nodes as well as control plane nodes, or just control plane nodes? Is the address 192.168.3.192
the value you provided as the control plane endpoint host in your cluster configuration? If you could include a sanitized copy of your cluster config that'd be helpful as well.
I ask as we specifically recommend that the IP address provided to your control plane via the Control Plane Configuration by excluded from the DHCP range. For more information, check out:
- https://anywhere.eks.amazonaws.com/docs/reference/vsphere/vsphere-prereq/#:~:text=Below%20are%20some,existent%20mac%20address.
- https://anywhere.eks.amazonaws.com/docs/reference/clusterspec/vsphere/#controlplaneconfigurationendpointhost-required
hi @danbudris, it happened again, seems that if any of the control plane or etcd nodes end up with a new DHCP assigned IP address that differs from their original IP address assigned at cluster creation, you lose access to the API server and no longer can control the cluster, the worker nodes seem fine, for me to resolve this, I have to go back and add DHCP static mappings of the original IP's to the mac addresses for the control plane and etcd nodes, then restart the VMs, after this I then have access again, 192.168.3.192
is not the control plane endpoint, below you can see a sanitized version of my cluster config, thanks!
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: Cluster
metadata:
name: prod
namespace: default
spec:
bundlesRef:
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
name: bundles-12
namespace: eksa-system
clusterNetwork:
cniConfig:
cilium: {}
pods:
cidrBlocks:
- 10.168.0.0/16
services:
cidrBlocks:
- 10.96.0.0/12
controlPlaneConfiguration:
count: 2
endpoint:
host: 192.168.3.10
machineGroupRef:
kind: VSphereMachineConfig
name: prod-cp
datacenterRef:
kind: VSphereDatacenterConfig
name: prod
externalEtcdConfiguration:
count: 3
machineGroupRef:
kind: VSphereMachineConfig
name: prod-etcd
kubernetesVersion: "1.22"
managementCluster:
name: prod
workerNodeGroupConfigurations:
- count: 4
machineGroupRef:
kind: VSphereMachineConfig
name: prod
name: md-0
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereDatacenterConfig
metadata:
name: prod
namespace: default
spec:
datacenter: Dark Systems Datacenter
insecure: true
network: /Dark Systems Datacenter/network/DSwitch-10GB-EKS
server: vcenter.vsphere.darksystems.ca
thumbprint: ""
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereMachineConfig
metadata:
annotations:
anywhere.eks.amazonaws.com/control-plane: "true"
name: prod-cp
namespace: default
spec:
datastore: /Dark Systems Datacenter/datastore/vsanDatastore
diskGiB: 25
folder: /Dark Systems Datacenter/vm/EKS Anywhere
memoryMiB: 8192
numCPUs: 2
osFamily: bottlerocket
resourcePool: /Dark Systems Datacenter/host/Cluster/Resources
template: /Dark Systems Datacenter/vm/Templates/bottlerocket-vmware-k8s-1.22-x86_64-1.8.0-a6233c22
users:
- name: ec2-user
sshAuthorizedKeys:
- ""
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereMachineConfig
metadata:
annotations:
anywhere.eks.amazonaws.com/etcd: "true"
name: prod-etcd
namespace: default
spec:
datastore: /Dark Systems Datacenter/datastore/vsanDatastore
diskGiB: 25
folder: /Dark Systems Datacenter/vm/EKS Anywhere
memoryMiB: 8192
numCPUs: 2
osFamily: bottlerocket
resourcePool: /Dark Systems Datacenter/host/Cluster/Resources
template: /Dark Systems Datacenter/vm/Templates/bottlerocket-vmware-k8s-1.22-x86_64-1.8.0-a6233c22
users:
- name: ec2-user
sshAuthorizedKeys:
- ""
---
apiVersion: anywhere.eks.amazonaws.com/v1alpha1
kind: VSphereMachineConfig
metadata:
name: prod
namespace: default
spec:
datastore: /Dark Systems Datacenter/datastore/vsanDatastore
diskGiB: 50
folder: /Dark Systems Datacenter/vm/EKS Anywhere
memoryMiB: 16384
numCPUs: 16
osFamily: bottlerocket
resourcePool: /Dark Systems Datacenter/host/Cluster/Resources
template: /Dark Systems Datacenter/vm/Templates/bottlerocket-vmware-k8s-1.22-x86_64-1.8.0-a6233c22
users:
- name: ec2-user
sshAuthorizedKeys:
- ""
any update on this ?
when this happens in our ubuntu eksa cluster i just change ip to match the new one in the deployment file "/etc/kubernetes/manifests/kube-apiserver.yaml" on the controlplane node. Then delete the pod and it solves the problem.