Heartbeat optimization in multi raft group.
Your question
In multi raft group service, there are many nodes in same server process that may be all leaders, then the heartbeat to their followers will consume CPU and network. We can share the heartbeat timer and merge heartbeat requests to the same follower server at once between these leaders to reduce the system consumpiton.
Your scenes
As above describe.
Your advice
Share the heartbeat timer and merge heartbeat requests to the same follower server at once between leaders in same server process to reduce the system consumpiton.
Environment
- SOFAJRaft version:
- JVM version (e.g.
java -version): - OS version (e.g.
uname -a): - Maven version:
- IDE version:
@killme2008 @fengjiachun https://docs.google.com/presentation/d/10u9QJT9exnSos3uZm-N0arhyso3EObUd86sypQ4Hcqs/edit?usp=sharing This is heartbeat optimization in multi raft group design which is based on sharing heartbeat timer.
@killme2008 I have some question about heartbeat timer in multi raft group service. When sending heartbeat requests to followers through calling readLeader method, heartbeat timer isn't used by leader peer as follow:
final ReadIndexHeartbeatResponseClosure heartbeatDone = new ReadIndexHeartbeatResponseClosure(closure,
respBuilder, quorum, peers.size());
// Send heartbeat requests to followers
for (final PeerId peer : peers) {
if (peer.equals(this.serverId)) {
continue;
}
this.replicatorGroup.sendHeartbeat(peer, heartbeatDone);
}
if (isHeartbeat) {
// Sending a heartbeat request
this.heartbeatCounter++;
RpcResponseClosure<AppendEntriesResponse> heartbeatDone;
// Prefer passed-in closure.
if (heartBeatClosure != null) {
heartbeatDone = heartBeatClosure;
} else {
heartbeatDone = new RpcResponseClosureAdapter<AppendEntriesResponse>() {
@Override
public void run(final Status status) {
onHeartbeatReturned(Replicator.this.id, status, request, getResponse(), monotonicSendTimeMs);
}
};
}
this.heartbeatInFly = this.rpcService.appendEntries(this.options.getPeerId().getEndpoint(), request,
this.options.getElectionTimeoutMs() / 2, heartbeatDone);
}
Sharing heartbeat timers is easy, but merging heartbeat requests is not an easy task and doesn't seem to have a significant benefit(Because the heartbeat request also needs to carry the term, leaderId, prevLogIndex etc.).
Can we consider hibernate, refer to tikv rfc23:https://github.com/tikv/rfcs/blob/master/text/2019-03-04-hibernate-raft.md
PR:https://github.com/tikv/tikv/pull/4591
We have two jobs:
- Sharing heartbeat timer
- Hibernate regions