cluster-api-operator icon indicating copy to clipboard operation
cluster-api-operator copied to clipboard

Upgrade from 0.11.0 to 0.12.0 failed

Open shanduur opened this issue 10 months ago • 2 comments

What steps did you take and what happened:

Helm install failed for release capi-operator-system/capi-operator with chart [email protected]: cannot patch "addonproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "addonproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block && cannot patch "bootstrapproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "bootstrapproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block && cannot patch "controlplaneproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "controlplaneproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block && cannot patch "coreproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "coreproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block && cannot patch "infrastructureproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "infrastructureproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block && cannot patch "ipamproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "ipamproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block && cannot patch "runtimeextensionproviders.operator.cluster.x-k8s.io" with kind CustomResourceDefinition: CustomResourceDefinition.apiextensions.k8s.io "runtimeextensionproviders.operator.cluster.x-k8s.io" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block

What did you expect to happen: Installation succeeded

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-operator version: 0.11.0 -> 0.12.0 -> 0.16.0
  • Cluster-api version:
  • Minikube/KIND version: N/A
  • Kubernetes version: (use kubectl version): v1.31.5
  • OS (e.g. from /etc/os-release): Debian

/kind bug /area artifacts

shanduur avatar Feb 01 '25 05:02 shanduur

Upgrading to 0.16.0 still shows the same error. Removing release and installing 0.16.0 worked fine, but this looks like operational nightmare, as required me to remove all CRDs to complete the installation.

shanduur avatar Feb 01 '25 06:02 shanduur

/triage needs-information

Thanks for reporting the issue.

The problem causing this is that our CRDs have an invalid CA Bundle (starting from k8s v1.31 specifically) that needs to be removed (a more detailed explanation on this topic is in https://kubernetes.slack.com/archives/C0EG7JC6T/p1722441161968339 slack thread).

We removed the CA bundle in https://github.com/kubernetes-sigs/cluster-api-operator/pull/591 but did not backport it to older releases (sorry for that), and it was part of the https://github.com/kubernetes-sigs/cluster-api-operator/releases/tag/v0.14.0 release and onwards. So, if you had the operator release installed >=v0.14.0 from the beginning and upgraded to newer releases of the operator, you would not have seen it 😄 .

Since we can't backport the changes to the v0.11/2/3 release series anymore, simply because we don't support those older branches (currently we are maintaining the latest - 1 release series, v0.16 & v0.15), can we close this issue?

furkatgofurov7 avatar Feb 06 '25 20:02 furkatgofurov7

I'm seeing this issue even without an upgrade, any change to bootstrap or controlplane provider causes the reconciler to fail healthchecks with

E0416 18:03:29.102031       1 controller.go:316] "Reconciler error" err="action failed after 10 attempts: failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"kubeadmconfigs.bootstrap.cluster.x-k8s.io\" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block" controller="bootstrapprovider" controllerGroup="operator.cluster.x-k8s.io" controllerKind="BootstrapProvider" BootstrapProvider="capi-operator-system/kubeadm" namespace="capi-operator-system" name="kubeadm" reconcileID="125a61de-77af-4432-a178-a21250110b2b"

and what's more entertaining, is that crd kubeadmconfigs.bootstrap.cluster.x-k8s.io has a valid value in

 conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        caBundle: **long_base64_cert not ending with Cg==**

bootstrapproviders.operator.cluster.x-k8s.io however has caBundle ending with Cg== for some reason, all components are latest versions.

v1.6.2 providers and capi operator v0.18.1

aliaksei-imi avatar Apr 16 '25 18:04 aliaksei-imi

Not sure this is the same as the original issue but I just got the same error trying to upgrade from v0.18.1 to v0.19.0 (Kubernetes v1.32.2). In my case the error is for the packet infrastructure provider:

~ k get infrastructureproviders.operator.cluster.x-k8s.io packet -o json | jq .status
{
  "conditions": [
    {
      "lastTransitionTime": "2025-04-28T19:13:34Z",
      "status": "True",
      "type": "PreflightCheckPassed"
    },
    {
      "lastTransitionTime": "2025-04-28T19:16:00Z",
      "message": "action failed after 10 attempts: failed to patch provider object: CustomResourceDefinition.apiextensions.k8s.io \"packetclusters.infrastructure.cluster.x-k8s.io\" is invalid: spec.conversion.webhookClientConfig.caBundle: Invalid value: []byte{0xa}: unable to load root certificates: unable to parse bytes as PEM block",
      "reason": "Install failed",
      "severity": "Warning",
      "status": "False",
      "type": "ProviderInstalled"
    }
  ],
  "observedGeneration": 1
}

Looking at the packetclusters.infrastructure.cluster.x-k8s.io CRD it has a valid caBundle. This was all installed last week using the cluster-api-operator v0.18.1.

I'm a little confused by the error message spec.conversion.webhookClientConfig.caBundle as that should be spec.conversion.webhook.clientConfig.caBundle (clientConfig is separate in apiextensions.k8s.io/v1).

Anyone know a workaround for this issue?

tenyo avatar Apr 28 '25 20:04 tenyo

Actually, it looks like that packetcluster CRD uses the old caBundle format: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/main/config/crd/patches/webhook_in_packetclusters.yaml#L15

tenyo avatar Apr 28 '25 22:04 tenyo

Same issue with awsclustercontrolleridentities.infrastructure.cluster.x-k8s.io, operator is complaining about caBundle but it is complaining about spec.conversion.webhookClientConfig.caBundle, not spec.conversion.webhook.clientConfig.caBundle as it is in the actual CRD.

bianchi2 avatar Jul 22 '25 22:07 bianchi2

This was fixed in CAPA in this commit: https://github.com/kubernetes-sigs/cluster-api-provider-aws/commit/678efb17e3b8f939bb592a60e46241a8728c7921

sl1pm4t avatar Jul 25 '25 01:07 sl1pm4t

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 23 '25 01:10 k8s-triage-robot

Closing it now, please re-open if needed.

furkatgofurov7 avatar Oct 30 '25 10:10 furkatgofurov7

Same issue with awsclustercontrolleridentities.infrastructure.cluster.x-k8s.io, operator is complaining about caBundle but it is complaining about spec.conversion.webhookClientConfig.caBundle, not spec.conversion.webhook.clientConfig.caBundle as it is in the actual CRD.

@bianchi2 did you find some solutions for this?

arnabmaji avatar Nov 15 '25 17:11 arnabmaji