cluster-api-provider-rke2 icon indicating copy to clipboard operation
cluster-api-provider-rke2 copied to clipboard

CAPD Control Plane machines fail because they have no IP Address available

Open ron1 opened this issue 1 year ago • 4 comments

What happened: CAPD Control Plane machine stuck in Provisioning PHASE fails because it has no IP Address available.

What did you expect to happen: CAPD Control Plane machine provisions successfully.

How to reproduce it: Execute the following steps to provision the CAPD cluster:

cat > kind-cluster-with-extramounts.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: capi-test
nodes:
- role: control-plane
  image: kindest/node:v1.24.15
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
EOF

kind create cluster --config kind-cluster-with-extramounts.yaml

clusterctl init --bootstrap rke2 --control-plane rke2 --infrastructure docker

export CABPR_NAMESPACE=example
export CLUSTER_NAME=capd-rke2-test
export CABPR_CP_REPLICAS=1
export CABPR_WK_REPLICAS=1
export KUBERNETES_VERSION=v1.24.15

export YAML_URL=https://raw.githubusercontent.com/rancher-sandbox/cluster-api-provider-rke2/v0.2.3/samples/docker/online-default/rke2-sample.yaml

curl -sL "${YAML_URL}" > rke2-sample.yaml
cat rke2-sample.yaml | clusterctl generate yaml > rke2-docker-example.yaml

kubectl apply -f rke2-docker-example.yaml

Note that the CAPD Control Plane node is stuck in the Provisioning PHASE as shown below:

$ kubectl get machine -A
NAMESPACE   NAME                                 CLUSTER          NODENAME   PROVIDERID   PHASE          AGE   VERSION
example     capd-rke2-test-control-plane-kd59v   capd-rke2-test                           Provisioning   23m   v1.24.15+rke2r1
example     worker-md-0-lt6dw-lqpml              capd-rke2-test                           Pending        23m   v1.24.15
$

Anything else you would like to add: Note the following errors that are consistently repeated in the rke2controlplane_controller log:

I0125 19:03:19.102640       1 rke2controlplane_controller.go:387]  "msg"="Reconcile RKE2 Control Plane" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.172494       1 rke2controlplane_controller.go:698]  "msg"="Unable to initialize workload cluster" "error"="failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://www.xx.y.z:6443/api/v1?timeout=30s\": EOF" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.173086       1 rke2controlplane_controller.go:463]  "msg"="failed to reconcile Control Plane conditions" "error"="failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://www.xx.y.z:6443/api/v1?timeout=30s\": EOF" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.195876       1 rke2controlplane_controller.go:153]  "msg"="Failed to update RKE2ControlPlane Status" "error"="some Control Plane machines exist and are ready but they have no IP Address available" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "cluster"="capd-rke2-test" "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"
E0125 19:03:29.196841       1 controller.go:324]  "msg"="Reconciler error" "error"="[failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://www.xx.y.z:6443/api/v1?timeout=30s\": EOF, some Control Plane machines exist and are ready but they have no IP Address available]" "RKE2ControlPlane"={"name":"capd-rke2-test-control-plane","namespace":"example"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="capd-rke2-test-control-plane" "namespace"="example" "reconcileID"="ee66ae3a-4fa2-4fc2-a9a4-ea8f6dab1f4c"

Environment:

  • rke provider version: 0.2.3
  • OS (e.g. from /etc/os-release): RHEL 8.9

ron1 avatar Jan 25 '24 19:01 ron1