cluster-api-provider-gcp Unable to create GCP cluster by following steps mentioned in quick start guide

trafficstars

What steps did you take and what happened: Following GCP instructions mentioned here: https://cluster-api.sigs.k8s.io/user/quick-start.html

What did you expect to happen: A GCP cluster expected to be created in specified region and project

Anything else you would like to add: Control plane VM is created and available in GCP console, steps after this seems to be not proceeding to complete the cluster creation process.

Following error seen in the logs of capg-controller-manager:

E0613 12:27:18.384087 1 gcpmachine_controller.go:231] controller/gcpmachine "msg"="Error reconciling instance resources" "error"="failed to retrieve bootstrap data: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil" "name"="gke-capi-md-0-97jwk" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPMachine" E0613 12:27:18.385544 1 controller.go:317] controller/gcpmachine "msg"="Reconciler error" "error"="failed to retrieve bootstrap data: error retrieving bootstrap data: linked Machine's bootstrap.dataSecretName is nil" "name"="gke-capi-md-0-97jwk" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="GCPMachine"

capi-kubeadm-control-plane-controller-manager Logs:

I0613 12:43:01.304022 1 controller.go:251] controller/kubeadmcontrolplane "msg"="Reconcile KubeadmControlPlane" "cluster"="gke-capi" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" E0613 12:43:21.499751 1 controller.go:188] controller/kubeadmcontrolplane "msg"="Failed to update KubeadmControlPlane Status" "error"="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/gke-capi": context deadline exceeded" "cluster"="gke-capi" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" E0613 12:43:21.500754 1 controller.go:317] controller/kubeadmcontrolplane "msg"="Reconciler error" "error"="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "default/gke-capi": context deadline exceeded" "name"="gke-capi-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"

Environment:

- Cluster-api version: clusterctl version: &version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.4", GitCommit:"1c3a1526f101d4b07d2eec757fe75e8701cf6212", GitTreeState:"clean", BuildDate:"2022-06-03T17:11:09Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/amd64"}

- Minikube/KIND version: kind v0.12.0 go1.17.8 linux/amd64

- Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:26:19Z", GoVersion:"go1.18.2", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-03-06T21:32:53Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

- OS (e.g. from /etc/os-release): NAME="Ubuntu" VERSION="20.04 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Jun 13 '22 12:06 mmlk09

New error in capi-kubeadm-control-plane-controller-manager:

I0615 15:32:47.847933 1 kubeadmconfig_controller.go:236] controller/kubeadmconfig "msg"="Cluster infrastructure is not ready, waiting" "kind"="Machine" "name"="gke-capi-md-0-7fbbd576bd-j56dm" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "version"="1377" 2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:46929: EOF 2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:7688: EOF I0615 15:34:32.121838 1 control_plane_init_mutex.go:99] init-locker "msg"="Attempting to acquire the lock" "cluster-name"="gke-capi" "configmap-name"="gke-capi-lock" "machine-name"="gke-capi-control-plane-xvrld" "namespace"="default" I0615 15:34:32.125356 1 kubeadmconfig_controller.go:380] controller/kubeadmconfig "msg"="Creating BootstrapData for the init control plane" "kind"="Machine" "name"="gke-capi-control-plane-xvrld" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "version"="1856" I0615 15:34:32.125793 1 kubeadmconfig_controller.go:872] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "ControlPlaneEndpoint"="34.149.221.102:443" I0615 15:34:32.125835 1 kubeadmconfig_controller.go:878] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "ClusterName"="gke-capi" I0615 15:34:32.125851 1 kubeadmconfig_controller.go:897] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "PodSubnet"="192.168.0.0/16" I0615 15:34:32.125866 1 kubeadmconfig_controller.go:904] controller/kubeadmconfig "msg"="Altering ClusterConfiguration" "name"="gke-capi-control-plane-n4xtb" "namespace"="default" "reconciler group"="bootstrap.cluster.x-k8s.io" "reconciler kind"="KubeadmConfig" "KubernetesVersion"="v1.23.0" 2022/06/15 15:34:32 http: TLS handshake error from 10.244.0.1:54614: EOF

Jun 15 '22 15:06 mmlk09

The nodes will not be provisioned before the control plane is ready, and the control plane will not announce itself as ready before a CNI plugin has been installed. If you did deploy CNI and the KubeadmControlPlane still refuses to enter Ready state, another good place to look for problems with control plane bootstrap is the serial console output of the control plane VM on GCP, kubelet will typically report more problems than you might be able to see in the cluster-api controller logs.

Jun 18 '22 13:06 itspngu

Following error is showing on GCM VM serial console, how do I fix this?

gke-capi-control-plane-bzjmd login: Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 74. Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent. Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent.

Jun 19 04:46:52 gke-capi-control-plane-bzjmd kubelet[1831]: E0619 04:46:52.221960 1831 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml"

Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Jun 19 04:46:52 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'. Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 75. Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent. Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent. Jun 19 04:47:02 gke-capi-control-plane-bzjmd kubelet[1838]: E0619 04:47:02.471678 1838 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml" Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Jun 19 04:47:02 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'. Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 76. Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: Stopped kubelet: The Kubernetes Node Agent. Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: Started kubelet: The Kubernetes Node Agent. Jun 19 04:47:12 gke-capi-control-plane-bzjmd kubelet[1845]: E0619 04:47:12.720046 1845 server.go:206] "Failed to load kubelet config file" err="failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory" path="/var/lib/kubelet/config.yaml" Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Jun 19 04:47:12 gke-capi-control-plane-bzjmd systemd[1]: kubelet.service: Failed with result 'exit-code'.

Jun 19 '22 04:06 mmlk09

kubelet.service: Scheduled restart job, restart counter is at 75.

You will likely find the reason it fails to start earlier in the logs, I remember seeing kubelet complain about missing /var/lib/kubelet/config.yaml and then it ended up being due to CNI problems.

PS: If you post code or log messages on Github, it's a lot easier for everyone to read them if you format them as code: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#fenced-code-blocks

Jun 19 '22 07:06 itspngu

having exactly the same issue, getting /var/lib/kubelet/config.yaml: no such file or directory error from the first control plane VM.

Jun 23 '22 06:06 zkl94

I'm having the same issue. I'm also unable to access the kube-apiserver via the capg-managed LB because the health check is failing (targeting port 6443), which in turn is failing because the kube-apiserver is not running on the VM. I'm not sure whether the kube-apiserver should be up at this stage of bootstrapping.

For context my team is trying to implement support for MachinePools via MIGs (issue here), but we can't start development until we have the current master state working. Could we get some assistance?

Aug 30 '22 19:08 harveyxia

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 28 '22 19:11 k8s-triage-robot

Just in case this helps someone else, I just ran into this issue and after debugging I found out that CAPG needs a Cloud NAT in the project (I didn't have time to track the cause further yet). Once I've created it (manually), the control-plane node started successfully and, after that, the other control-plane nodes and worker were instantiated.

Dec 13 '22 11:12 stg-0

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jan 27 '23 18:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Feb 26 '23 19:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Feb 26 '23 19:02 k8s-ci-robot

cluster-api-provider-gcp cluster-api-provider-gcp copied to clipboard

Unable to create GCP cluster by following steps mentioned in quick start guide

cluster-api-provider-gcp
cluster-api-provider-gcp copied to clipboard