AmazonVPC CNI broken in Kops
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
1.29.2
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.29.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX
export AWS_REGION=eu-central-1
export KOPS_STATE_STORE=s3://example-kops-state-store
kops create -f kops.yaml
kops update cluster --name test.example.com --yes --admin
5. What happened after the commands executed?
Cluster is created, CNI is broken ( AmazonVPC ), Pod->Pod, Pod->Service fails to i/o timeout Running mixed topology, Control Plane, Workloads in private subnet, Gateway nodes in public subnet
NAMESPACE NAME READY STATUS RESTARTS AGE
default test678aa 1/1 Running 0 6m7s
kube-system aws-cloud-controller-manager-2sk5s 1/1 Running 0 11m
kube-system aws-node-8jw57 2/2 Running 0 8m49s
kube-system aws-node-nftng 2/2 Running 0 9m17s
kube-system aws-node-phkdf 2/2 Running 0 11m
kube-system aws-node-termination-handler-5b988d67cd-2hjlb 0/1 CrashLoopBackOff 6 (45s ago) 11m
kube-system coredns-78ccb5b8c5-4rq4c 1/1 Running 0 8m16s
kube-system coredns-78ccb5b8c5-gmzx5 0/1 Running 4 (60s ago) 11m
kube-system coredns-autoscaler-55c99b49b7-pffqc 1/1 Running 0 11m
kube-system ebs-csi-controller-65676964b6-7vx7d 5/6 CrashLoopBackOff 9 (15s ago) 11m
kube-system ebs-csi-node-4ldz5 3/3 Running 7 (2m51s ago) 10m
kube-system ebs-csi-node-5rl4v 2/3 CrashLoopBackOff 6 (50s ago) 9m17s
kube-system ebs-csi-node-zddbk 2/3 CrashLoopBackOff 6 (41s ago) 8m49s
kube-system etcd-manager-events-i-0fe4d8007f51c493b 1/1 Running 0 10m
kube-system etcd-manager-main-i-0fe4d8007f51c493b 1/1 Running 0 9m44s
kube-system kops-controller-7cbpn 1/1 Running 0 11m
kube-system kube-apiserver-i-0fe4d8007f51c493b 2/2 Running 2 (11m ago) 10m
kube-system kube-controller-manager-i-0fe4d8007f51c493b 1/1 Running 3 (11m ago) 10m
kube-system kube-proxy-i-001e89332beaa4ab7 1/1 Running 0 9m17s
kube-system kube-proxy-i-0182e11c841a4f31b 1/1 Running 0 8m49s
kube-system kube-proxy-i-0fe4d8007f51c493b 1/1 Running 0 10m
kube-system kube-scheduler-i-0fe4d8007f51c493b 1/1 Running 0 10m
6. What did you expect to happen?
CNI is working
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: test.example.com
spec:
api:
loadBalancer:
class: Network
type: Public
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: s3://example-kops-state-store/test.example.com
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: control-plane-eu-central-1a
name: a
manager:
backupRetentionDays: 90
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: control-plane-eu-central-1a
name: a
manager:
backupRetentionDays: 90
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
useServiceAccountExternalPermissions: true
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- 0.0.0.0/0
- ::/0
kubernetesVersion: 1.29.7
networkCIDR: 172.20.0.0/16
networking:
amazonvpc: {}
nonMasqueradeCIDR: 172.20.0.0/16
serviceAccountIssuerDiscovery:
discoveryStore: s3://example-kops-oidc-store/test.example.com/discovery/test.example.com
enableAWSOIDCProvider: true
sshAccess:
- 0.0.0.0/0
- ::/0
subnets:
- cidr: 172.20.0.0/19
name: eu-central-1a-public
type: Public
zone: eu-central-1a
- cidr: 172.20.32.0/19
name: eu-central-1b-public
type: Public
zone: eu-central-1b
- cidr: 172.20.64.0/19
name: eu-central-1a-private
type: Private
zone: eu-central-1a
- cidr: 172.20.96.0/19
name: eu-central-1b-private
type: Private
zone: eu-central-1b
- cidr: 172.20.128.0/19
name: eu-central-1a-Utility
type: Utility
zone: eu-central-1a
- cidr: 172.20.160.0/19
name: eu-central-1b-Utility
type: Utility
zone: eu-central-1b
topology:
dns:
type: None
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: control-plane-eu-central-1a
spec:
image: 137112412989/al2023-ami-2023.5.20240722.0-kernel-6.1-arm64
machineType: t4g.medium
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-central-1a-private
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: workload-eu-central-1a
spec:
image: 137112412989/al2023-ami-2023.5.20240722.0-kernel-6.1-x86_64
machineType: t3a.xlarge
maxSize: 3
minSize: 1
role: Node
subnets:
- eu-central-1a-private
nodeLabels:
role: "workload"
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: test.example.com
name: gateway-eu-central-1a
spec:
image: 137112412989/al2023-ami-2023.5.20240722.0-kernel-6.1-x86_64
machineType: t3a.xlarge
maxSize: 3
minSize: 1
role: Node
subnets:
- eu-central-1a-public
nodeLabels:
role: "gateway"
taints:
- node.com/type=gateway:NoSchedule
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?