fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Detect offline clusters

Open weyfonk opened this issue 1 year ago • 1 comments

This adds a cluster status monitor to the Fleet controller, which checks when each cluster last saw its agent online. If more than the expected interval elapses, that cluster is considered offline, and the monitor updates its bundle deployments' statuses to reflect that. This will trigger status updates to bundles, GitRepos, clusters and cluster groups.

Refers to #594.

Open points:

  • how far, and how fine-grained, do we want to make bundle deployment status updates for offline clusters? This currently takes a fairly basic approach, updating both Ready and Monitored conditions while clearing modified and non-ready statuses, to prevent outdated messages from appearing in a bundle deployment's display status and further up the chain of status updates (to bundles, then upwards to GitRepos, etc)
  • should we make the frequency of monitoring configurable?
  • do we have a way to exclude the local/management cluster from agent-last-seen checks?

weyfonk avatar Oct 07 '24 10:10 weyfonk

This needs more discussion around UI/UX.

weyfonk avatar Oct 25 '24 09:10 weyfonk