omni
omni copied to clipboard
[feature] Enable automatic updates
Problem Description
It would be cool if Omni was able to automatically update:
- The Kubernetes version of a cluster
- The Talos version of nodes in a cluster
Solution
No response
Alternative Solutions
No response
Notes
No response
That could be cool, but I wouldn't have peace of mind knowing that my cluster can do unattended upgrades :sweat_smile: If that feature gets enough demand we can think about doing that.
I totally understand that! I am thinking it should be opt-in, and that the functionality around upgrading kubernetes version and talos version should support rollbacks in either case before this is implemented. That way the cluster could upgrade, and in case of issues, it could rollback and require an admin to manually do the upgrade :-)
I was thinking about this too. Here are some of my thoughts.
- Upgrades would need a scheduling mechanism (eg downtime window) so it can be controlled.
- Upgrades should have a skew or release schedule so you can upgrade at your own pace. CoreOS did this with alpha, beta, stable channels and GKE has rapid, regular, and stable release channels. I like this idea because you can opt your dev/staging clusters in to rapid and your production clusters will update automatically later.
- Omni needs automated notifications. If an update stage takes longer than XX amount of time it should have a way to notify an administrator. Same thing if steps fail (eg pre-upgrade checks) or manual work needs to be done (eg bootstrap manifist updates).
- We'll want a way to gate specific upgrades. For example maybe you should always automatically update patch versions but not minor versions (k8s and talos)
We may need to build release channels and notifications before we can do the rest of this, but maybe limiting upgrades to patch versions and adding maintenance windows would be good enough.
Before moving to Omni/Talos, I used the Rancher System Upgrade Controller to automatically upgrade my K3S cluster. Maybe instead of building your own thing from scratch, you could do whatever is required to support this?
I have external-to-Omni organizational processes which I need to accommodate.
I'd love to use this feature with my external processes modeled as an explicit approval step. It'd be even better if pending upgrades were queryable (and approvable!) through an API. Approval could perhaps be modeled as a preflight check alongside cluster health and such; it's just a box that needs to be ticked before making changes.
Unattended upgrades could even be phrased in terms of an approval mechanism: "automatically approve stable releases", "automatically approve releases after 7 days", "automatically approve point releases", etc.
I think there's a lot of utility gained by reifying the operator's approval to apply an update, independent of whether that decision is manual (external) or the automatic result of applying some policy.