aws-cloud-provider(version 1.27.1) always crash
What happened: k8s cluster: 1.27.6
master node: kubeadm_config.yaml, and run kubeadm join
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
containerLogMaxSize: "200Mi"
containerLogMaxFiles: 3
imageGCHighThresholdPercent: 80
imageGCLowThresholdPercent: 75
imageMinimumGCAge: "5m30s"
providerID: "aws"
evictionHard:
memory.available: "200Mi"
imagefs.available: "15%"
worker node: run kubeadm join
cluster info
#kubectl get node
NAME STATUS ROLES AGE VERSION
ip-10-142-23-229.ec2.internal Ready <none> 36m v1.27.6
ip-10-142-39-245.ec2.internal Ready control-plane 30h v1.27.6
ip-10-142-42-164.ec2.internal Ready control-plane 30h v1.27.6
ip-10-142-61-198.ec2.internal Ready control-plane 30h v1.27.6
#kubectl get node ip-10-142-23-229.ec2.internal -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
...
spec:
podCIDR: 192.168.8.0/24
podCIDRs:
- 192.168.8.0/24
providerID: aws
aws cloud controller crash:
1115 07:06:42.124716 1 aws.go:861] Setting up informers for Cloud
W1115 07:06:42.124764 1 controllermanager.go:313] "tagging" is disabled
I1115 07:06:42.124773 1 controllermanager.go:317] Starting "cloud-node"
I1115 07:06:42.128849 1 controllermanager.go:336] Started "cloud-node"
I1115 07:06:42.131324 1 controllermanager.go:317] Starting "cloud-node-lifecycle"
I1115 07:06:42.128909 1 node_controller.go:161] Sending events to api server.
I1115 07:06:42.131591 1 node_controller.go:170] Waiting for informer caches to sync
I1115 07:06:42.131945 1 controllermanager.go:336] Started "cloud-node-lifecycle"
I1115 07:06:42.131964 1 controllermanager.go:317] Starting "service"
I1115 07:06:42.132052 1 node_lifecycle_controller.go:113] Sending events to api server
I1115 07:06:42.133178 1 controllermanager.go:336] Started "service"
I1115 07:06:42.133400 1 controllermanager.go:317] Starting "route"
I1115 07:06:42.133409 1 core.go:104] Will not configure cloud provider routes, --configure-cloud-routes: false
W1115 07:06:42.133418 1 controllermanager.go:324] Skipping "route"
I1115 07:06:42.133728 1 controller.go:229] Starting service controller
I1115 07:06:42.133802 1 shared_informer.go:311] Waiting for caches to sync for service
E1115 07:06:42.142644 1 runtime.go:79] Observed a panic: &errors.errorString{s:"unable to calculate an index entry for key \"ip-10-142-23-229.ec2.internal\" on index \"instanceID\": error mapping node \"ip-10-142-23-229.ec2.internal\"'s provider ID \"aws\" to instance ID: Invalid format for AWS instance (aws)"} (unable to calculate an index entry for key "ip-10-142-23-229.ec2.internal" on index "instanceID": error mapping node "ip-10-142-23-229.ec2.internal"'s provider ID "aws" to instance ID: Invalid format for AWS instance (aws))
What you expected to happen: aws-cloud-provider should not crash. How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version): 1.27.6 - Cloud provider or hardware configuration: aws
- OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS
- Kernel (e.g.
uname -a): Linux ip-10-142-1-183 6.2.0-1015-aws #15~22.04.1-Ubuntu SMP Fri Oct - Install tools:
- Others:
/kind bug
This issue is currently awaiting triage.
If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The node.spec.provider is "aws". But aws-cloud-provider expected 'providerID: aws:///us-east-1a/i-xxxx'
I find when i start kubelet with "--cloud-provider=external" on master and worker nodes, the node.spec.providerId looks like "aws:///region/instnaceid". And aws cloud controller will not crash
@datavisorhenryzhao Are you still seeing this issue?
the crash issue has been fixed in https://github.com/kubernetes/cloud-provider-aws/pull/605. i will work in backporting the fix to older versions
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
This is resolved across all the active release branches.