cluster-api-operator cert-manager updates and version management

User Story

As an operator I would like the CAPI operator to manage updates to cert-manager for me so it's kept in sync with my providers' dependencies.

Detailed Description

cert-manager support was recently introduced in https://github.com/kubernetes-sigs/cluster-api-operator/pull/144. This facilitates a lot the first CAPI bootstrap, making it almost a one command task.

I understand that initially it was made a non-goal to manage cert-manager and that was left to the user. However, I do agree that the path initiated by #144 is the right one, given that most users want a simple and streamlined experience, leaving the dependency management to the CAPI tooling. I believe this is specially true for users coming from clusterctl, where cert-manager is automatically managed (installation and updates). And in this new path, I think there are few places where the experience can be improved:

cert-manager CRDs. Currently, our chart includes cert-manager as a subchart and including the upstream cert-manager CRDs in the crds folder. This works great for the initial installation, but any future cert-manager update that includes changes to the CRD will fail, since helm won't update them.
cert-manager and operator coupled updates. If core CAPI starts requiring a new version of cert-manager, cert-manager will need to be updated before the providers are updated. Assuming that in this case we will release a new version of the chart and bump the cert-manager version, that means now the operator needs to be updated before the providers. This kind of coupling is not ideal, since the operator aims at being independent from the version of the operators that are installed. Moreover, it requires users to keep track of the upstream CAPI dependencies and cross check them with our chart.

Ideally, we would handle these two scenarios automatically by updating cert-manager to the required version by the core provider. This eliminates the need for the user to keep track of cert-manager versions, run manual CRD upgrades, check the version required upstream and ultimately streamlines the CAPI management experience in the same way clusterctl does. However, this presents some challenges:

Map cert-manager version to a core provider version. Currently, there is not an easy way (and even less an "API friendly" way) to know the cert-manager version pinned by a particular CAPI version.
The operator webhooks depend on cert-manager. The operator itself depends on cert-manager, hence it can't wait until the first CoreProvider is created before installing it. I believe the solution here is to maintain something similar to what the chart does today, install a default version but only during helm install. Then let the operator update that version based on the selected CoreProvider.
cert-manager updates in air-gapped environments. We would need to implement something similar to what is done today with ConfigMaps for providers. Although for this case, it might be better for users to manage cert-manager themselves, since they would need to keep track of the versions either way.

Alternatively, if we decided that moving this responsibility to the operator code is too much and/or we wanted to address this issues sparely, we could manage cert-manage CRDs updates using helm hooks and a kubernetes Job, similar to what I propose in https://github.com/kubernetes-sigs/cluster-api-operator/issues/188.

Anything else you would like to add:

This is not a full fledged proposal and there are definitely some gaps in the design, but looking for feedback before investing more time on this.

/kind feature

Jul 13 '23 22:07 g-gaston

This issue is currently awaiting triage.

If CAPI Operator contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jul 13 '23 22:07 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 24 '24 11:01 k8s-triage-robot

/remove-lifecycle stale

Jan 24 '24 12:01 nikParasyr

At a minimum, when you go to deploy a fresh install of CAPI right now specifying you desire to have cert-manager installed it should install the proper version matched to CAPI. It does not today. It seems this script is used with values from CI perhaps? to deploy cert-manager 1.13.2 and the current deployment of CAPI is 1.14.2 version of cert-manager.

Mar 06 '24 22:03 dtzar

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 04 '24 23:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jul 04 '24 23:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Aug 03 '24 23:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Aug 03 '24 23:08 k8s-ci-robot

cluster-api-operator cluster-api-operator copied to clipboard

cert-manager updates and version management

cluster-api-operator
cluster-api-operator copied to clipboard