vespa Emit single metric for how out of sync the cluster data is

Emit single metric for how out of sync the cluster data is

Open vekterli opened this issue 9 months ago • 0 comments

@hakonhall please review. Once merged, we should observe the output of this in practice for a bit before wiring it to anything.

With these changes the cluster controller continuously maintains a global aggregate across all content nodes that represents the number of pending and total buckets per bucket space. This aggregate can be sampled in O(1) time.

An explicit metric cluster-buckets-out-of-sync-ratio has been added, and the value is also emitted as part of the cluster state REST API. Note: only emitted when statistics have been received from all distributors for a particular cluster state version, as it would otherwise potentially represent a sample at an arbitrary time point between two or more distinct states.

May 06 '24 12:05 vekterli

vespa vespa copied to clipboard

Emit single metric for how out of sync the cluster data is

vespa
vespa copied to clipboard