The plain provisioner cannot rollout Bundles containing CRDs and instances of those CRDs
When deploying the plain provisioner using make run KIND_CLUSTER_NAME=kind and creating the following BundleDeployment resource:
apiVersion: core.rukpak.io/v1alpha1
kind: BundleDeployment
metadata:
name: olm-v0.20.0
spec:
template:
spec:
source:
type: image
image:
ref: quay.io/tflannag/olm-plain-bundle:olm-v0.20.0-combined
provisionerClassName: core-rukpak-io-plain
provisionerClassName: core-rukpak-io-plain
The provisioner fails to rollout that Bundle, with the BundleDeployment reporting a failed state:
$ k get bundledeployments olm-v0.20.0 -o yaml
apiVersion: core.rukpak.io/v1alpha1
kind: BundleDeployment
metadata:
...
spec:
provisionerClassName: core-rukpak-io-plain
template:
metadata: {}
spec:
provisionerClassName: core-rukpak-io-plain
source:
image:
ref: quay.io/tflannag/olm-plain-bundle:olm-v0.20.0-combined
type: image
status:
conditions:
- lastTransitionTime: "2022-11-22T23:07:58Z"
message: Successfully unpacked the olm-v0.20.0-wz4fw2 Bundle
reason: UnpackSuccessful
status: "True"
type: HasValidBundle
- lastTransitionTime: "2022-11-22T23:08:08Z"
message: |-
required resource not found: error while running post render on files: [resource mapping not found for name: "operatorhubio-catalog" namespace: "olm" from "": no matches for kind "CatalogSource" in version "operators.coreos.com/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "packageserver" namespace: "olm" from "": no matches for kind "ClusterServiceVersion" in version "operators.coreos.com/v1alpha1"
ensure CRDs are installed first, resource mapping not found for name: "cluster" namespace: "" from "": no matches for kind "OLMConfig" in version "operators.coreos.com/v1"
ensure CRDs are installed first, resource mapping not found for name: "global-operators" namespace: "operators" from "": no matches for kind "OperatorGroup" in version "operators.coreos.com/v1"
ensure CRDs are installed first, resource mapping not found for name: "olm-operators" namespace: "olm" from "": no matches for kind "OperatorGroup" in version "operators.coreos.com/v1"
ensure CRDs are installed first]
reason: InstallFailed
status: "False"
type: Installed
Poked around the helm chart library and I'm fairly sure I know how to address this, so I'm going to take a stab at it and assign myself.
If your idea is to separate the CRDs out into Helm's separate CRD handling -- think again.
I'm pretty sure Helm will ignore those CRDs during upgrades. :)
I was planning on hacking around this tbh with a set that tracks CRDs <-> CRs when we're building up the release chart, and if a Bundle contains CRDs and CRs, put the CRD in the chrt.Files array. It sounds like that's a bad idea though given we'll still land in problematic territory for the upgrade case?
I'm pretty sure Helm will ignore those CRDs during upgrades. :)
I haven't played around with this locally, but the language in https://helm.sh/docs/topics/charts/#limitations-on-crds seems to confirm this.
0.4-0.5 candidate
During grooming, some context for this issue was missing. This may get pushed back.
This issue has become stale because it has been open 60 days with no activity. The maintainers of this repo will remove this label during issue triage or it will be removed automatically after an update. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.
I've tried a number of solutions to work around this issue, and based on the following findings I don't think this problem will be worth addressing until we potentially remove the helm code that's being used under the hood.
The following outlines the solutions I attempted and the problems I found with each:
1. When building the helm chart from a plain bundle, place the CRDs into helm's crds/ folder.
Problem # 1: The returned install manifest will not include the CRDs by default.
Solution:: We must take additional actions to set owner references and controller watches on the CRD objects. However, when we compare the installed manifest later with a dry-run upgrade to determine if an upgrade is needed, we will not be able to detect changes to the CRDs. Additionally, we will not have any saved data for CRDs to rollback to in case of a failed upgrade.
Sub-solution: Tell helm to include the CRDs in the install manifest. This gives us back the owner references and controller watches for free, but creates another larger problem. When we do a dry-run upgrade to detect changes, we will always detect a change and ask for an upgrade because the dry-run will never include CRDs, due to aforementioned helm limitations around updating CRDs in the crds/ folder. This creates a never-ending cycle of updates. The sub-sub-solution is to remove the CRDs from the install manifest before our comparison, but then we run into the same problems as the original solution to # 1 above.
Problem # 2: We can no longer upgrade the CRDs in a plain bundle due to helm's limitation. When using the crds/ folder in helm there's no way around this unless helm itself changes.
2. Extract the CRDs from the helm chart and install them as a separate helm chart.
Problem: This separates the chart into two and requires that we manage the lifecycle of both separately. It also requires a change to the helm-operator-plugins API in order to expose rollback actions. Normally when an upgrade fails the action client will automatically perform a rollback of the helm manifest. Modifying the API to change how we handle rollbacks would impact downstream users. Performing the upgrade through multiple steps also has potential to put us into interesting scenarios, especially when non-CRD resources are potentially invalidated during an upgrade or rollback of CRDs.
3. Extract the CRDs from the helm chart and install them ourselves with client-go.
Problem: We're now getting into messy territory where we're using a hybrid install and upgrade approach mixing two different solutions together. We lose the install -> fail -> uninstall and upgrade -> fail -> rollback features for CRDs that we previously had for free. In order to detect updates, run upgrades, and perform rollbacks we must now to some extent re-implement helm functionality - and at that point we're no longer getting much benefit from using helm in the first place.
Solution: We strip out helm completely from the backend installation and upgrade, implementing our own solution that is free from helm's limitations.
From these findings I think the longer term solution is to do these installs and upgrades ourselves. However, since the situation that causes this problem in the first place is pretty niche, I don't think this is something we should tackle while we're working towards OLM v1 Milestone 1. That being said, I'm now also a bit concerned about helm bundles, since any clients that create charts using helm's crds/ folder are going to run into the "no CRD upgrades" limitation and rukpak won't be able to handle that.
This issue has become stale because it has been open 60 days with no activity. The maintainers of this repo will remove this label during issue triage or it will be removed automatically after an update. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.