openraft
openraft copied to clipboard
Split RaftMetrics into multiple sub metrics
Anyway, it would be desirable to split it to multiple independent metrics, possibly opting out of some completely, since the metrics is communicated way too often and it is very expensive.
Mostly, the user will be only interested in leader <-> follower transitions and the like (i.e., very infrequent events), not in each request processed.
- Membership metrics: about leader,follower etc.
- replication metrics: only a leader has these metrics.
- log and snapshot metrics.
Originally posted by @schreter in https://github.com/datafuselabs/openraft/issues/228#issuecomment-1056580449
👋 Thanks for opening this issue!
Get help or engage by:
-
/help
: to print help messages. -
/assignme
: to assign this issue to you.
Implementation detail need to be provided.
Yes, the three metrics make sense.
Regarding publishing them, we can make it dependent on whether there is a subscriber or not. As soon as the first subscriber is attached to a particular metrics, i.e., requests the channel receiver, then the appropriate broadcast channel would be materialized and stored in Option<Arc<SomeChannel::Publisher>>
in the Raft core and filled with the current metrics. Additionally, store a Weak
reference to the SomeChannel::Receiver
in the Raft core. This method would have to be async method to execute synchronized in the Raft core, of course, but that's not an issue.
When at the place where we need to decide to publish new metrics or not, first check whether the channel exists in the first place. If not, return. Else check whether the strong reference count on the SomeChannel::Receiver
reached zero. If yes, then just drop the channel publisher (i.e., reset the Option
) and return. Else, do the publishing.
This way, we don't publish things the receiver is not interested in.
Just not sure about the replication statistics. We should probably still use AtomicU64
to publish a change. Additionally, some global change counter (AtomicBool
) could be set (if not set yet) to indicate a change to be published the next time the Raft loop runs.
For the sake of simplicity, the node type transition metrics (leader <-> follower; which is seldom updated) might be published every time, since a) likely there will be interest for it and b) it won't cause much overhead, since it's just on the "control plane".
In addition to publishing metrics via push, it would be handy to have also a "pull" mechanism (i.e., an async method) to request each particular metrics, since there are use cases where you want to collect data for example once a minute to compute throughput. Then, it doesn't make sense to have metrics published every time, just pull the data once a minute to compute appropriate averages. OTOH, this can be also realized by subscribing to a metrics, reading the current value and dropping the receiver.