microcluster Microcluster itself can be out of sync with the underlying Dqlite’s membership configuration.

I tested clustering with Canonical k8s and embedded etcd.

Hereby, I noticed that Microcluster itself can be out of sync with the underlying Dqlite’s membership configuration. In this scenario we have a three node cluster where we take down one node by killing its VM process. Then we join a node and try to remove the previously killed node. We observe diverging cluster state: Microcluster table shows 3 nodes as voters, Microcluster’s Dqlite has 4 nodes with the killed one marked as a spare and etcd shows 2 nodes.

See also: https://github.com/canonical/go-dqlite/issues/388, https://github.com/canonical/dqlite/issues/799

Aug 01 '25 11:08 louiseschmidtgen

Hi @louiseschmidtgen, thanks for the report.

Are you aware of https://chat.canonical.com/canonical/pl/tuxr7u5origfiqgzo9tkpfgrqe? A few months ago I started to question what is up with the roles adjustment frequency and hook and why we even use this in Microcluster as it seems to be a not up-to-date copy of information which should rather be sourced from dqlite directly.

Microcluster has this concept of a hearbeat (detached from the actual raft heartbeat happening in dqlite) which is used for a continuous check-in of the cluster members. As part of this Microcluster also updates the member roles inside the core_cluster_members table.

Overall I don't see why dqlite even has this WithRolesAdjustmentHook as (if you look at the code) it doesn't really do much. Ultimately I would like Microcluster to query dqlite directly for roles and member status info but this will likely require more investigation about the impact on the current design of Microcluster.

I noticed that Microcluster itself can be out of sync with the underlying Dqlite’s membership configuration

I hope that the scenario you are observing is an effect of what I have described above.

Aug 01 '25 12:08 roosterfish

I'll flag it as a bug.

Aug 01 '25 12:08 roosterfish