vespa
vespa copied to clipboard
Emit single metric for how out of sync the cluster data is
@hakonhall please review. Once merged, we should observe the output of this in practice for a bit before wiring it to anything.
With these changes the cluster controller continuously maintains a global aggregate across all content nodes that represents the number of pending and total buckets per bucket space. This aggregate can be sampled in O(1) time.
An explicit metric cluster-buckets-out-of-sync-ratio
has been added, and the value is also emitted as part of the cluster state REST API. Note: only emitted when statistics have been received from all distributors for a particular cluster state version, as it would otherwise potentially represent a sample at an arbitrary time point between two or more distinct states.