Microcluster itself can be out of sync with the underlying Dqlite’s membership configuration.
I tested clustering with Canonical k8s and embedded etcd.
Hereby, I noticed that Microcluster itself can be out of sync with the underlying Dqlite’s membership configuration. In this scenario we have a three node cluster where we take down one node by killing its VM process. Then we join a node and try to remove the previously killed node. We observe diverging cluster state: Microcluster table shows 3 nodes as voters, Microcluster’s Dqlite has 4 nodes with the killed one marked as a spare and etcd shows 2 nodes.
See also: https://github.com/canonical/go-dqlite/issues/388, https://github.com/canonical/dqlite/issues/799
Hi @louiseschmidtgen, thanks for the report.
Are you aware of https://chat.canonical.com/canonical/pl/tuxr7u5origfiqgzo9tkpfgrqe? A few months ago I started to question what is up with the roles adjustment frequency and hook and why we even use this in Microcluster as it seems to be a not up-to-date copy of information which should rather be sourced from dqlite directly.
Microcluster has this concept of a hearbeat (detached from the actual raft heartbeat happening in dqlite) which is used for a continuous check-in of the cluster members. As part of this Microcluster also updates the member roles inside the core_cluster_members table.
Overall I don't see why dqlite even has this WithRolesAdjustmentHook as (if you look at the code) it doesn't really do much.
Ultimately I would like Microcluster to query dqlite directly for roles and member status info but this will likely require more investigation about the impact on the current design of Microcluster.
I noticed that Microcluster itself can be out of sync with the underlying Dqlite’s membership configuration
I hope that the scenario you are observing is an effect of what I have described above.
I'll flag it as a bug.