cluster-api-operator
cluster-api-operator copied to clipboard
cert-manager updates and version management
User Story
As an operator I would like the CAPI operator to manage updates to cert-manager for me so it's kept in sync with my providers' dependencies.
Detailed Description
cert-manager
support was recently introduced in https://github.com/kubernetes-sigs/cluster-api-operator/pull/144. This facilitates a lot the first CAPI bootstrap, making it almost a one command task.
I understand that initially it was made a non-goal to manage cert-manager and that was left to the user. However, I do agree that the path initiated by #144 is the right one, given that most users want a simple and streamlined experience, leaving the dependency management to the CAPI tooling. I believe this is specially true for users coming from clusterctl
, where cert-manager is automatically managed (installation and updates). And in this new path, I think there are few places where the experience can be improved:
- cert-manager CRDs. Currently, our chart includes cert-manager as a subchart and including the upstream cert-manager CRDs in the
crds
folder. This works great for the initial installation, but any future cert-manager update that includes changes to the CRD will fail, since helm won't update them. - cert-manager and operator coupled updates. If core CAPI starts requiring a new version of cert-manager, cert-manager will need to be updated before the providers are updated. Assuming that in this case we will release a new version of the chart and bump the cert-manager version, that means now the operator needs to be updated before the providers. This kind of coupling is not ideal, since the operator aims at being independent from the version of the operators that are installed. Moreover, it requires users to keep track of the upstream CAPI dependencies and cross check them with our chart.
Ideally, we would handle these two scenarios automatically by updating cert-manager to the required version by the core provider. This eliminates the need for the user to keep track of cert-manager versions, run manual CRD upgrades, check the version required upstream and ultimately streamlines the CAPI management experience in the same way clusterctl
does. However, this presents some challenges:
- Map cert-manager version to a core provider version. Currently, there is not an easy way (and even less an "API friendly" way) to know the cert-manager version pinned by a particular CAPI version.
- The operator webhooks depend on cert-manager. The operator itself depends on cert-manager, hence it can't wait until the first CoreProvider is created before installing it. I believe the solution here is to maintain something similar to what the chart does today, install a default version but only during
helm install
. Then let the operator update that version based on the selected CoreProvider. - cert-manager updates in air-gapped environments. We would need to implement something similar to what is done today with ConfigMaps for providers. Although for this case, it might be better for users to manage cert-manager themselves, since they would need to keep track of the versions either way.
Alternatively, if we decided that moving this responsibility to the operator code is too much and/or we wanted to address this issues sparely, we could manage cert-manage CRDs updates using helm hooks and a kubernetes Job
, similar to what I propose in https://github.com/kubernetes-sigs/cluster-api-operator/issues/188.
Anything else you would like to add:
This is not a full fledged proposal and there are definitely some gaps in the design, but looking for feedback before investing more time on this.
/kind feature
This issue is currently awaiting triage.
If CAPI Operator contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
At a minimum, when you go to deploy a fresh install of CAPI right now specifying you desire to have cert-manager installed it should install the proper version matched to CAPI. It does not today. It seems this script is used with values from CI perhaps? to deploy cert-manager 1.13.2 and the current deployment of CAPI is 1.14.2 version of cert-manager.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.