cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard
[EKS] Error creating clusters with the quick start guide
What steps did you take and what happened:
Created the management cluster using eksctl :
eksctl create cluster --profile aaa --name bbb
All the AWS provider steps from initialization to generating the workload clusters executed successfully. However the workload cluster is stuck in a provisioning state.
kubectl get clusters NAME PHASE AGE VERSION capi-quickstart Provisioning 179m
Environment:
Ubuntu vm on wsl on windows 10: Linux HW0KK13 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Using AWS EKS provider for hosting clusters.
kubeadm get kubeadmcontrolplane
NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION capi-quickstart-control-plane capi-quickstart 139m v1.23.3
clusterctl version: clusterctl version: &version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.3", GitCommit:"31146bd17a220ef6214c4c7a21f1aa57380b6b1f", GitTreeState:"clean", BuildDate:"2022-03-08T18:52:05Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}
kubectl logs -n capa-system capa-controller-manager-589ff74bd4-gtnst manager
I0414 23:25:49.696474 1 awsmachine_controller.go:432] "msg"="Reconciling AWSMachine" I0414 23:25:49.696845 1 awsmachine_controller.go:448] "msg"="Cluster infrastructure is not ready yet" I0414 23:25:49.697384 1 awsmachine_controller.go:432] "msg"="Reconciling AWSMachine" I0414 23:25:49.697408 1 awsmachine_controller.go:448] "msg"="Cluster infrastructure is not ready yet" E0414 23:25:49.734323 1 controller.go:317] controller/awscluster "msg"="Reconciler error" "error"="failed to create new vpc: failed to create vpc: RequestExpired: Request has expired.\n\tstatus code: 400, request id: 49625214-edd8-47d3-bf31-64e29ba9cd0a" "name"="capi-quickstart" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster" I0414 23:25:49.734681 1 awscluster_controller.go:245] controller/awscluster "msg"="Reconciling AWSCluster" "cluster"="capi-quickstart" "name"="capi-quickstart" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
kubectl logs -n capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-58db4b5555-x9pw9 manager
I0414 21:24:07.942724 1 request.go:665] Waited for 1.02897964s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0. 1:443/apis/apiextensions.k8s.io/v1?timeout=32s I0414 21:24:08.597219 1 logr.go:249] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="localhost:8080" I0414 21:24:08.598891 1 logr.go:249] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Default er or WithDefaulter wasn't called" "GVK"={"Group":"bootstrap.cluster.x-k8s.io","Version":"v1beta1","Kind":"KubeadmConfig"} I0414 21:24:08.599455 1 logr.go:249] controller-runtime/builder "msg"="Registering a validating webhook" "GVK"={"Group":"bootstrap.cluster.x-k8s.io"," Version":"v1beta1","Kind":"KubeadmConfig"} "path"="/validate-bootstrap-cluster-x-k8s-io-v1beta1-kubeadmconfig" I0414 21:24:08.599594 1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-bootstrap-cluster-x-k8s-io-v1beta1-kube admconfig" I0414 21:24:08.599773 1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/convert" I0414 21:24:08.599824 1 logr.go:249] controller-runtime/builder "msg"="Conversion webhook enabled" "GVK"={"Group":"bootstrap.cluster.x-k8s.io","Versio n":"v1beta1","Kind":"KubeadmConfig"} I0414 21:24:08.599846 1 logr.go:249] controller-runtime/builder "msg"="skip registering a mutating webhook, object does not implement admission.Default er or WithDefaulter wasn't called" "GVK"={"Group":"bootstrap.cluster.x-k8s.io","Version":"v1beta1","Kind":"KubeadmConfigTemplate"} I0414 21:24:08.599885 1 logr.go:249] controller-runtime/builder "msg"="Registering a validating webhook" "GVK"={"Group":"bootstrap.cluster.x-k8s.io"," Version":"v1beta1","Kind":"KubeadmConfigTemplate"} "path"="/validate-bootstrap-cluster-x-k8s-io-v1beta1-kubeadmconfigtemplate" I0414 21:24:08.599973 1 server.go:146] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-bootstrap-cluster-x-k8s-io-v1beta1-kube admconfigtemplate" I0414 21:24:08.600097 1 logr.go:249] controller-runtime/builder "msg"="Conversion webhook enabled" "GVK"={"Group":"bootstrap.cluster.x-k8s.io","Versio n":"v1beta1","Kind":"KubeadmConfigTemplate"} I0414 21:24:08.600233 1 logr.go:249] setup "msg"="starting manager" "version"="v1.1.3" I0414 21:24:08.600307 1 server.go:214] controller-runtime/webhook/webhooks "msg"="Starting webhook server" I0414 21:24:08.600434 1 internal.go:362] "msg"="Starting server" "addr"={"IP":"::","Port":9440,"Zone":""} "kind"="health probe" I0414 21:24:08.600500 1 internal.go:362] "msg"="Starting server" "addr"={"IP":"127.0.0.1","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics" I0414 21:24:08.600506 1 logr.go:249] controller-runtime/certwatcher "msg"="Updated current TLS certificate" I0414 21:24:08.600609 1 logr.go:249] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=9443 I0414 21:24:08.600667 1 leaderelection.go:248] attempting to acquire leader lease capi-kubeadm-bootstrap-system/kubeadm-bootstrap-manager-leader-electi on-capi... I0414 21:24:08.600853 1 logr.go:249] controller-runtime/certwatcher "msg"="Starting certificate watcher" I0414 21:24:08.618438 1 leaderelection.go:258] successfully acquired lease capi-kubeadm-bootstrap-system/kubeadm-bootstrap-manager-leader-election-capi I0414 21:24:08.618654 1 controller.go:178] controller/kubeadmconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "recon ciler kind"="KubeadmConfig" "source"="kind source: *v1beta1.KubeadmConfig" I0414 21:24:08.618676 1 controller.go:178] controller/kubeadmconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "recon ciler kind"="KubeadmConfig" "source"="kind source: *v1beta1.Machine" I0414 21:24:08.618695 1 controller.go:178] controller/kubeadmconfig "msg"="Starting EventSource" "reconciler group"="bootstrap.cluster.x-k8s.io" "recon ciler kind"="KubeadmConfig" "source"="kind source: *v1beta1.Cluster" I0414 21:24:08.618708 1 controller.go:186] controller/kubeadmconfig "msg"="Starting Controller" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconc iler kind"="KubeadmConfig" I0414 21:24:08.719961 1 controller.go:220] controller/kubeadmconfig "msg"="Starting workers" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconcile r kind"="KubeadmConfig" "worker count"=10 I0414 21:40:26.996210 1 kubeadmconfig_controller.go:236] controller/kubeadmconfig "msg"="Cluster infrastructure is not ready, waiting" "kind"="Machine" "name"="capi-quickstart-md-0-b9ddd959b-kqcmd" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "versi on"="84528" I0414 21:40:27.081743 1 kubeadmconfig_controller.go:236] controller/kubeadmconfig "msg"="Cluster infrastructure is not ready, waiting" "kind"="Machine" "name"="capi-quickstart-md-0-b9ddd959b-kqcmd" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "versi on"="84545"
kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-11-254.us-east-2.compute.internal Ready 12h v1.21.5-eks-9017834 ip-192-168-36-148.us-east-2.compute.internal Ready 12h v1.21.5-eks-9017834
clusterctl describe cluster capi-quickstart
NAME READY SEVERITY REASON SINCE MESSAGE Cluster/capi-quickstart False Warning VpcReconciliationFailed 3m21s 0 of 7 completed ├─ClusterInfrastructure - AWSCluster/capi-quickstart False Warning VpcReconciliationFailed 3m21s 0 of 7 completed ├─ControlPlane - KubeadmControlPlane/capi-quickstart-control-plane └─Workers └─MachineDeployment/capi-quickstart-md-0 False Warning WaitingForAvailableMachines 3m21s Minimum availability requires 3 replicas, current 0 available └─3 Machines... False Info WaitingForClusterInfrastructure 3m21s See capi-quickstart-md-0-b9ddd959b-md84g, capi-quickstart-md-0-b9ddd959b-q7dm4, ...
kubectl get cluster capi-quickstart -o yaml
apiVersion: cluster.x-k8s.io/v1beta1 kind: Cluster metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"cluster.x-k8s.io/v1beta1","kind":"Cluster","metadata":{"annotations":{},"name":"capi-quickstart","namespace":"default"},"spec":{"clusterNetwork":{"pods":{"cidrBlocks":["192.168.0.0/16"]}},"controlPlaneRef":{"apiVersion":"controlplane.cluster.x-k8s.io/v1beta1","kind":"KubeadmControlPlane","name":"capi-quickstart-control-plane"},"infrastructureRef":{"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"AWSCluster","name":"capi-quickstart"}}} creationTimestamp: "2022-04-15T14:32:18Z" finalizers:
cluster.cluster.x-k8s.io generation: 1 name: capi-quickstart namespace: default resourceVersion: "5630" uid: a7223ded-6a1d-43b3-9708-441c66386fe3 spec: clusterNetwork: pods: cidrBlocks: 192.168.0.0/16 controlPlaneEndpoint: host: "" port: 0 controlPlaneRef: apiVersion: controlplane.cluster.x-k8s.io/v1beta1 kind: KubeadmControlPlane name: capi-quickstart-control-plane namespace: default infrastructureRef: apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 kind: AWSCluster name: capi-quickstart namespace: default status: conditions: lastTransitionTime: "2022-04-15T14:32:20Z" message: 0 of 7 completed reason: VpcReconciliationFailed severity: Warning status: "False" type: Ready lastTransitionTime: "2022-04-15T14:32:20Z" message: Waiting for control plane provider to indicate the control plane has been initialized reason: WaitingForControlPlaneProviderInitialized severity: Info status: "False" type: ControlPlaneInitialized lastTransitionTime: "2022-04-15T14:32:20Z" reason: WaitingForControlPlane severity: Info status: "False" type: ControlPlaneReady lastTransitionTime: "2022-04-15T14:32:20Z" message: 0 of 7 completed reason: VpcReconciliationFailed severity: Warning status: "False" type: InfrastructureReady observedGeneration: 1 phase: Provisioning
What did you expect to happen: Workload clusters get created.
Anything else you would like to add:
Cluster-api version:
Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:58:47Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-eks-0d102a7", GitCommit:"eb09fc479c1b2bfcc35c47416efb36f1b9052d58", GitTreeState:"clean", BuildDate:"2022-02-17T16:36:28Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"} WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1
Environment:
OS (e.g. from /etc/os-release): NAME="Ubuntu" VERSION="20.04.4 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.4 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels] cluster-info-dump.txt
@soodr: This issue is currently awaiting triage.
If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Looks like pod is failing to make SDK calls.
Have you already attached the required permissions to the control plane instances (assuming CAPA controllers are on the EKS control-plane nodes)? https://cluster-api-aws.sigs.k8s.io/topics/eks/prerequisites.html
yes. I followed the steps here. https://cluster-api.sigs.k8s.io/user/quick-start.html specifically the command
clusterawsadm bootstrap iam create-cloudformation-stack
@soodr I had this issue, I solved this by updating my base64 encoded credentials in the secret capa-manager-bootstrap-credentials
(mine had expired as we use SSO/MFA)
hello ! We also are using some "non-static-creds" way to connect to AWS (a user is doing an ADFS connection on a 1st AWS account and a role is assumed in a 2nd AWS account to create infrastructure) and I guess we have the same kind of problem with clusterAPI :
I am interested about the "clean" way to use ClusterAPI in such environments (aside from updating the capa credentials manually when the session expires ^). Is there such a way or do we have to use "static" AWS access_key/secret_key for clusterAPI to work without issue please ?
I came across this issue even though I have static credential.
- KinD v0.12.0 on MacOS to run management cluster
- clusterctl 1.2.0
- clusterawsadm 1.4.1
I came across this issue even though I have static credential.
- KinD v0.12.0 on MacOS to run management cluster
- clusterctl 1.2.0
- clusterawsadm 1.4.1
Do you see the same error in the issue? Because this looks like a temporary credential issue:
E0414 23:25:49.734323 1 controller.go:317] controller/awscluster "msg"="Reconciler error" "error"="failed to create new vpc: failed to create vpc: RequestExpired: Request has expired.\n\tstatus code: 400, request id: 49625214-edd8-47d3-bf31-64e29ba9cd0a" "name"="capi-quickstart" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSCluster"
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Hey @sedefsavas I refreshed my credentials, even after that, I see this error. Do we need to reinitialize the aws
infra via clusterctl
? Seems, the capa-controller isn't picking up the new credentials. Could you share how could we ascertain this ?
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.