turtles icon indicating copy to clipboard operation
turtles copied to clipboard

Unable to enable Azure CAPIProvider

Open mantis-toboggan-md opened this issue 1 year ago • 4 comments
trafficstars

What steps did you take and what happened?

The Azure CAPIProvider fails to enable correctly in the 0.4.0 capi ui. The status changes from Provisioning to Ready but eventually becomes Unavailable. The capz-controller-manager pod is in CrashLoopBackoff with the error

“failed to get informer from cache” err=“failed to get API group resources: unable to retrieve the complete list of server APIs: bootstrap.cluster.x-k8s.io/v1beta1: the server could not find the requested resource” logger=“controller-runtime.source.EventHandler”

What did you expect to happen?

I would expect the capz-controller-manager pod to be running and azure capiprovider resource to be in Ready state

How to reproduce it?

No response

Rancher Turtles version

No response

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug

mantis-toboggan-md avatar Apr 16 '24 17:04 mantis-toboggan-md

Hi @mantis-toboggan-md, thanks for reporting this. I was able to reproduce the issue with the following configuration:

  • Rancher v2.8.2
  • Rancher Turtles v0.7.0
  • Rancher Turtles UI v0.4.0

Looks like this may be related to missing resources bootstrap.cluster.x-k8s.io. This custom resource is generally available via Kubeadm but, since Turtles is using RKE2 for boostrap and control plane provisioning, CAPRKE2 is providing this resource instead.

For some reason CAPZ is not detecting the api resource via RKE2 but, if installing Kubeadm and the re-trying CAPZ installation, it applies the changes successfully.

I did apply this yaml file before installing CAPZ:

---
apiVersion: v1
kind: Namespace
metadata:
  name: capi-kubeadm-bootstrap-system
---
apiVersion: turtles-capi.cattle.io/v1alpha1
kind: CAPIProvider
metadata:
  name: kubeadm-bootstrap
  namespace: capi-kubeadm-bootstrap-system
spec:
  name: kubeadm
  type: bootstrap
  version: v1.4.6
  configSecret:
    name: variables
---
apiVersion: v1
kind: Namespace
metadata:
  name: capi-kubeadm-control-plane-system
---
apiVersion: turtles-capi.cattle.io/v1alpha1
kind: CAPIProvider
metadata:
  name: kubeadm-control-plane
  namespace: capi-kubeadm-control-plane-system
spec:
  name: kubeadm
  type: controlPlane
  version: v1.4.6
  configSecret:
    name: variables

And then the Azure provider was successfully installed via Rancher UI and the controller did not report any errors.

The custom resource that the logs report as missing should be available via the RKE2 provider so we need to investigate this a bit further to propose a solution.

salasberryfin avatar May 09 '24 06:05 salasberryfin

Could it be because there is no bootstrap.cluster.x-k8s.io/v1beta1 available in CAPRKE2, but only v1alpha1

furkatgofurov7 avatar May 09 '24 10:05 furkatgofurov7

Opened a new upstream issue https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/4854 to track the fix on CAPZ. Once the community accepts this proposal, we'll submit the PR effectively removing the dependency on Kubeadm when enabling MachinePools.

salasberryfin avatar May 15 '24 07:05 salasberryfin

Upstream PR: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/4868

salasberryfin avatar May 22 '24 08:05 salasberryfin

done

kkaempf avatar Jul 30 '24 10:07 kkaempf