fleet
fleet copied to clipboard
Detect offline clusters
This adds a cluster status monitor to the Fleet controller, which checks when each cluster last saw its agent online. If more than the expected interval elapses, that cluster is considered offline, and the monitor updates its bundle deployments' statuses to reflect that. This will trigger status updates to bundles, GitRepos, clusters and cluster groups.
Refers to #594.
Open points:
- how far, and how fine-grained, do we want to make bundle deployment status updates for offline clusters? This currently takes a fairly basic approach, updating both
ReadyandMonitoredconditions while clearing modified and non-ready statuses, to prevent outdated messages from appearing in a bundle deployment's display status and further up the chain of status updates (to bundles, then upwards toGitRepos, etc) - should we make the frequency of monitoring configurable?
- do we have a way to exclude the local/management cluster from agent-last-seen checks?
This needs more discussion around UI/UX.