microcluster icon indicating copy to clipboard operation
microcluster copied to clipboard

Differentiate between a "pending" node and a "joining" node.

Open masnax opened this issue 10 months ago • 0 comments

As soon as the existing cluster validates a join token for a newly joining node, it will add a PENDING record into the database for that node. This status will remain until a heartbeat syncs all cluster members, at which point the status will change to the dqlite role.

The problem is this doesn't communicate anything about where in the join process the joiner is in. It may not have started serving its API yet, it may not have opened its database, and it may not be trusted across the cluster.

If we add another cluster status for JOINING, we can maintain that status until the joining node has started serving the public API, it has started the database, and it is trusted across the cluster.

This will make the OnNewMember hooks more reliable as we will be sure that they will only run on nodes that actually have database access, or are even available to listen for the request to execute the hook.

Additionally, this can help with managing concurrent joins, because we can batch multiple calls of OnNewMember for multiple nodes into one call after no nodes are in a JOINING state.

masnax avatar Apr 22 '24 23:04 masnax