serving-operator
serving-operator copied to clipboard
Remove custom resource cleanup logic.
Problem Today, every time a resource needs to be deleted from an older minor version X, the Operator is updated with custom code in release X+1 which cleans up those resources.
Generally, so long as the Operator is only upgraded by one minor version at a time, we can remove the code after a minor release. However due to the existence of https://github.com/knative/serving-operator/issues/3, we cannot guarantee that the cleanup is unnecessary or that the user goes up by one minor release (though it's highly recommended).
I propose we begin to implement some parts of the Knative Operator rethink
Specifically:
- Move resource cleanup into minor version tagged directories paths (just post-upgrade for now).
- Vendor these into the Operator
- When performing an upgrade, identify the minor version that is currently present (X) and the version being upgraded to (Y)
- Upgrade to Y
- Run all post-upgrade Jobs present between X and Y.
Later to fix https://github.com/knative/serving-operator/issues/3, we can vendor all minor version releases < X in Operator X, and go through the upgrade minor by minor.
Persona: Which persona is this feature for? This is largely an improvement in code structure, rather than a feature. Thus, this primarily targets the Contributors persona, and improves the process of maintaining the code, by lowering the number of version specific changes that are made to the Operator codebase.
Exit Criteria No custom deletion logic exists in the Operator.
Time Estimate (optional): How many developer-days do you think this may take to resolve? 2 developer-weeks or so, I'm hoping :)
Some comments from slack:
- Why Job definitions and not vendor in the go code?
- Could we have a single image that knows how to do the upgrades in general?
- Does a rolling window of X minor versions supported by operator Y alleviate the concerns of 'one job to upgrade them all'?
Agree these are worth thinking about, so copying them here.
What about a single k8s Job per component that takes arguments of “to” (and maybe “from”) release, and within the Job code there’s a framework (like sharedinformer) which registers a map[string]func(…) error, and then users write a func(…) error and install it into the map.
After some discussion in the slack, we've converged on this pattern for how to write a package which can handle safe upgrades without needing a new job per release.
I will try to formalize this a bit (potentially in a doc, maybe in code...) and follow up :)