microcluster icon indicating copy to clipboard operation
microcluster copied to clipboard

fix: add consistency checks across core_cluster_members, truststore, and dqlite

Open louiseschmidtgen opened this issue 2 months ago • 9 comments

Problem

Microcluster can enter inconsistent states where core_cluster_members (database), truststore, and dqlite cluster configuration become out of sync during partial failures. This leads to failed operations and difficult recovery scenarios.

Solution

Implements membership consistency validation before critical operations:

Validates before operations: Checks all three sources match before joins, removals, and token generation Clear error messages: Shows differences between sources when inconsistencies detected making it possible for admins to recover their cluster before worse things happen.

Changes Made

  • state.go - Core consistency checking logic with CheckMembershipConsistency()
  • cluster.go - Added checks before join/remove operations
  • tokens.go - Added checks before token generation
  • main.sh - Integration test simulating inconsistent state and verifying blocked operations

Testing

./example/test/main.sh membership  # Test membership consistency 

louiseschmidtgen avatar Oct 17 '25 13:10 louiseschmidtgen