openraft Split RaftMetrics into multiple sub metrics

Anyway, it would be desirable to split it to multiple independent metrics, possibly opting out of some completely, since the metrics is communicated way too often and it is very expensive.

Mostly, the user will be only interested in leader <-> follower transitions and the like (i.e., very infrequent events), not in each request processed.

Membership metrics: about leader,follower etc.
replication metrics: only a leader has these metrics.
log and snapshot metrics.

Originally posted by @schreter in https://github.com/datafuselabs/openraft/issues/228#issuecomment-1056580449

Mar 02 '22 09:03 drmingdrmer

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

Mar 02 '22 09:03 github-actions[bot]

Implementation detail need to be provided.

Mar 02 '22 09:03 drmingdrmer

Yes, the three metrics make sense.

Regarding publishing them, we can make it dependent on whether there is a subscriber or not. As soon as the first subscriber is attached to a particular metrics, i.e., requests the channel receiver, then the appropriate broadcast channel would be materialized and stored in Option<Arc<SomeChannel::Publisher>> in the Raft core and filled with the current metrics. Additionally, store a Weak reference to the SomeChannel::Receiver in the Raft core. This method would have to be async method to execute synchronized in the Raft core, of course, but that's not an issue.

When at the place where we need to decide to publish new metrics or not, first check whether the channel exists in the first place. If not, return. Else check whether the strong reference count on the SomeChannel::Receiver reached zero. If yes, then just drop the channel publisher (i.e., reset the Option) and return. Else, do the publishing.

This way, we don't publish things the receiver is not interested in.

Just not sure about the replication statistics. We should probably still use AtomicU64 to publish a change. Additionally, some global change counter (AtomicBool) could be set (if not set yet) to indicate a change to be published the next time the Raft loop runs.

For the sake of simplicity, the node type transition metrics (leader <-> follower; which is seldom updated) might be published every time, since a) likely there will be interest for it and b) it won't cause much overhead, since it's just on the "control plane".

In addition to publishing metrics via push, it would be handy to have also a "pull" mechanism (i.e., an async method) to request each particular metrics, since there are use cases where you want to collect data for example once a minute to compute throughput. Then, it doesn't make sense to have metrics published every time, just pull the data once a minute to compute appropriate averages. OTOH, this can be also realized by subscribing to a metrics, reading the current value and dropping the receiver.

Mar 02 '22 10:03 schreter

openraft openraft copied to clipboard

Split RaftMetrics into multiple sub metrics

openraft
openraft copied to clipboard