omni icon indicating copy to clipboard operation
omni copied to clipboard

[feature] Enable automatic updates

Open devantler opened this issue 1 year ago • 5 comments
trafficstars

Problem Description

It would be cool if Omni was able to automatically update:

  • The Kubernetes version of a cluster
  • The Talos version of nodes in a cluster

Solution

No response

Alternative Solutions

No response

Notes

No response

devantler avatar May 13 '24 18:05 devantler

That could be cool, but I wouldn't have peace of mind knowing that my cluster can do unattended upgrades :sweat_smile: If that feature gets enough demand we can think about doing that.

Unix4ever avatar May 14 '24 10:05 Unix4ever

I totally understand that! I am thinking it should be opt-in, and that the functionality around upgrading kubernetes version and talos version should support rollbacks in either case before this is implemented. That way the cluster could upgrade, and in case of issues, it could rollback and require an admin to manually do the upgrade :-)

devantler avatar May 14 '24 11:05 devantler

I was thinking about this too. Here are some of my thoughts.

  • Upgrades would need a scheduling mechanism (eg downtime window) so it can be controlled.
  • Upgrades should have a skew or release schedule so you can upgrade at your own pace. CoreOS did this with alpha, beta, stable channels and GKE has rapid, regular, and stable release channels. I like this idea because you can opt your dev/staging clusters in to rapid and your production clusters will update automatically later.
  • Omni needs automated notifications. If an update stage takes longer than XX amount of time it should have a way to notify an administrator. Same thing if steps fail (eg pre-upgrade checks) or manual work needs to be done (eg bootstrap manifist updates).
  • We'll want a way to gate specific upgrades. For example maybe you should always automatically update patch versions but not minor versions (k8s and talos)

We may need to build release channels and notifications before we can do the rest of this, but maybe limiting upgrades to patch versions and adding maintenance windows would be good enough.

rothgar avatar May 14 '24 17:05 rothgar

Before moving to Omni/Talos, I used the Rancher System Upgrade Controller to automatically upgrade my K3S cluster. Maybe instead of building your own thing from scratch, you could do whatever is required to support this?

kenlasko avatar May 14 '24 17:05 kenlasko

I have external-to-Omni organizational processes which I need to accommodate.

I'd love to use this feature with my external processes modeled as an explicit approval step. It'd be even better if pending upgrades were queryable (and approvable!) through an API. Approval could perhaps be modeled as a preflight check alongside cluster health and such; it's just a box that needs to be ticked before making changes.

Unattended upgrades could even be phrased in terms of an approval mechanism: "automatically approve stable releases", "automatically approve releases after 7 days", "automatically approve point releases", etc.

I think there's a lot of utility gained by reifying the operator's approval to apply an update, independent of whether that decision is manual (external) or the automatic result of applying some policy.

willglynn avatar May 14 '24 17:05 willglynn