kops
kops copied to clipboard
kops k8s cluster deploy on Debian 11 not working as expected (coredns, ebs-csi)
/kind bug
1. What kops
version are you running? The command kops version
, will display
this information.
Client version: 1.24.1 (git-v1.24.1)
2. What Kubernetes version are you running? kubectl version
will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops
flag.
Client Version: v1.24.3
Kustomize Version: v4.5.4
Server Version: v1.24.3
3. What cloud provider are you using? AWS
4. What commands did you run? What is the simplest way to reproduce this issue? kops create cluster operations.k8s.local --node-count 1 --networking amazonvpc --zones eu-west-1a,eu-west-1b,eu-west-1c --master-size c5.xlarge --node-size c5.xlarge --dry-run -o yaml > operations.k8s.yam
Update spec: image: 136693071363/debian-11-amd64-20220816-1109 for all InstanceGroups
kops create -f operations.k8s.yaml kops update cluster --name operations.k8s.local --yes
5. What happened after the commands executed?
admin@i-0061efbbb5a6c5ff8:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-cloud-controller-manager-9mmq2 1/1 Running 0 4m25s
kube-system aws-node-lvbqr 1/1 Running 0 4m25s
kube-system aws-node-xr78k 1/1 Running 0 3m20s
kube-system coredns-autoscaler-865477f6c7-v4dqj 1/1 Running 0 4m24s
kube-system coredns-d48868b66-jx7lv 0/1 Running 1 (40s ago) 4m24s
kube-system dns-controller-7467bcd6ff-fl6jk 1/1 Running 0 4m24s
kube-system ebs-csi-controller-5c9b6f6b6-g5zbm 4/5 CrashLoopBackOff 4 (28s ago) 4m24s
kube-system ebs-csi-node-5tzn7 2/3 CrashLoopBackOff 4 (28s ago) 4m25s
kube-system ebs-csi-node-ssq8p 3/3 Running 4 (6s ago) 3m20s
kube-system etcd-manager-events-i-0061efbbb5a6c5ff8 1/1 Running 0 3m35s
kube-system etcd-manager-main-i-0061efbbb5a6c5ff8 1/1 Running 0 3m39s
kube-system kops-controller-x7rgp 1/1 Running 0 4m25s
kube-system kube-apiserver-i-0061efbbb5a6c5ff8 2/2 Running 0 3m21s
kube-system kube-controller-manager-i-0061efbbb5a6c5ff8 1/1 Running 2 (4m55s ago) 3m7s
kube-system kube-proxy-i-0061efbbb5a6c5ff8 1/1 Running 0 3m57s
kube-system kube-proxy-i-0a5b036b8f7263fda 1/1 Running 0 2m35s
kube-system kube-scheduler-i-0061efbbb5a6c5ff8 1/1 Running 0 3m24s
6. What did you expect to happen?
coredns and ebs-csi running
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2022-08-22T08:04:46Z"
name: operations.k8s.local
spec:
api:
loadBalancer:
class: Classic
type: Public
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: s3://figshare-kops/operations.k8s.local
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-1a
name: a
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-1a
name: a
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- 0.0.0.0/0
- ::/0
kubernetesVersion: 1.24.3
masterPublicName: api.operations.k8s.local
networkCIDR: 172.20.0.0/16
networking:
amazonvpc: {}
nonMasqueradeCIDR: 172.20.0.0/16
sshAccess:
- 0.0.0.0/0
- ::/0
sshKeyName: duduta-operations
subnets:
- cidr: 172.20.32.0/19
name: eu-west-1a
type: Public
zone: eu-west-1a
- cidr: 172.20.64.0/19
name: eu-west-1b
type: Public
zone: eu-west-1b
- cidr: 172.20.96.0/19
name: eu-west-1c
type: Public
zone: eu-west-1c
topology:
dns:
type: Public
masters: public
nodes: public
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2022-08-22T08:04:49Z"
labels:
kops.k8s.io/cluster: operations.k8s.local
name: master-eu-west-1a
spec:
image: 136693071363/debian-11-amd64-20220816-1109
instanceMetadata:
httpPutResponseHopLimit: 3
httpTokens: required
machineType: c5.xlarge
manager: CloudGroup
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-eu-west-1a
role: Master
subnets:
- eu-west-1a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2022-08-22T08:04:49Z"
labels:
kops.k8s.io/cluster: operations.k8s.local
name: nodes-eu-west-1a
spec:
image: 136693071363/debian-11-amd64-20220816-1109
instanceMetadata:
httpPutResponseHopLimit: 1
httpTokens: required
machineType: c5.xlarge
manager: CloudGroup
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: nodes-eu-west-1a
role: Node
subnets:
- eu-west-1a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2022-08-22T08:04:49Z"
labels:
kops.k8s.io/cluster: operations.k8s.local
name: nodes-eu-west-1b
spec:
image: 136693071363/debian-11-amd64-20220816-1109
instanceMetadata:
httpPutResponseHopLimit: 1
httpTokens: required
machineType: c5.xlarge
manager: CloudGroup
maxSize: 0
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: nodes-eu-west-1b
role: Node
subnets:
- eu-west-1b
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2022-08-22T08:04:50Z"
labels:
kops.k8s.io/cluster: operations.k8s.local
name: nodes-eu-west-1c
spec:
image: 136693071363/debian-11-amd64-20220816-1109
instanceMetadata:
httpPutResponseHopLimit: 1
httpTokens: required
machineType: c5.xlarge
manager: CloudGroup
maxSize: 0
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: nodes-eu-west-1c
role: Node
subnets:
- eu-west-1c
8. Please run the commands with most verbose logging by adding the -v 10
flag.
Paste the logs into this report, or in a gist and provide the gist link here.
admin@i-0061efbbb5a6c5ff8:~$ kubectl logs coredns-d48868b66-jx7lv -n kube-system
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration MD5 = 35aa07598ca78c83ea20e1faff6dfc16
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[ERROR] plugin/errors: 2 8293112652682877544.1657391515730422967. HINFO: read udp 172.20.41.54:57500->172.20.0.2:53: i/o timeout
admin@i-0061efbbb5a6c5ff8:~$ kubectl logs ebs-csi-node-ssq8p -n kube-system
Defaulted container "ebs-plugin" out of: ebs-plugin, node-driver-registrar, liveness-probe
I0822 08:11:46.901136 1 metadata.go:85] retrieving instance data from ec2 metadata
W0822 08:11:53.180678 1 metadata.go:88] ec2 metadata is not available
I0822 08:11:53.180696 1 metadata.go:96] retrieving instance data from kubernetes api
I0822 08:11:53.181435 1 metadata.go:101] kubernetes api is available
panic: error getting Node i-0a5b036b8f7263fda: Get "https://172.20.0.1:443/api/v1/nodes/i-0a5b036b8f7263fda": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc00003afa0)
/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:86 +0x269
github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc000517f30, 0x8, 0x55})
/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:95 +0x38e
main.main()
/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x365
9. Anything else do we need to know?
Same behavior with Ubuntu 22.04 (Jammy), works fine with Ubuntu 20.04 (Focal)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Ubuntu 2204 is a known issue. See #14140 Could be the same issue happens on Debian 11 as well.
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Same behavior with Ubuntu 22.04 (Jammy), works fine with Ubuntu 20.04 (Focal)
Also happened in Ubuntu 20.04.5 LTS EKS version --> v1.23.7 kernel version --> 5.15.0-1022-aws container runtime --> containerd://1.5.9
k logs pod/ebs-csi-node-gf77x -n kube-system I0301 16:38:59.305809 1 node.go:98] regionFromSession Node service I0301 16:38:59.305873 1 metadata.go:85] retrieving instance data from ec2 metadata W0301 16:39:05.584601 1 metadata.go:88] ec2 metadata is not available I0301 16:39:05.584617 1 metadata.go:96] retrieving instance data from kubernetes api I0301 16:39:05.585571 1 metadata.go:101] kubernetes api is available panic: error getting Node ip-172-21-12-4.ec2.internal: Get "https://10.100.0.1:443/api/v1/nodes/ip-172-21-12-4.ec2.internal": dial tcp 10.100.0.1:443: i/o timeout
goroutine 1 [running]: github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.newNodeService(0xc00022c1e0) /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node.go:101 +0x345 github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver.NewDriver({0xc000525f30, 0x8, 0x31c6b50?}) /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/driver.go:95 +0x393 main.main() /go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/cmd/main.go:46 +0x37d
Does anyone understand what is the root cause of the issue on Ubuntu 22.04 and are there known workarounds?