microceph Unable to scale from 3 to 2 cluster members without force

Affected version: squid/stable

A three member MicroCeph cluster (each node with one OSD) cannot be scaled down to two members because MicroCeph fails with the following error:

root@micro01:~# microceph cluster remove micro03
Error: Failed to execute pre-remove hook on cluster member "micro03": Need at least 3 mon, 1 mds, and 1 mgr besides micro03

Also see this conversation for reference https://chat.canonical.com/canonical/pl/i6pj4hyrejfeiyqqd1psrmo6we.

Reproducer steps:

Start with a MicroCeph cluster having three members
Add an OSD on each of the members

root@micro01:~# microceph cluster remove micro03 # Error is expected
Error: Failed to execute pre-remove hook on cluster member "micro03": Node micro03 still has disks configured, remove before proceeding

root@micro01:~# microceph disk list
Disks configured in MicroCeph:
+-----+----------+----------------------------------------------------+
| OSD | LOCATION |                        PATH                        |
+-----+----------+----------------------------------------------------+
| 1   | micro02  | /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_disk1 |
+-----+----------+----------------------------------------------------+
| 2   | micro03  | /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_disk1 |
+-----+----------+----------------------------------------------------+
| 3   | micro01  | /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_disk1 |
+-----+----------+----------------------------------------------------+

root@micro01:~# microceph disk remove 2 --confirm-failure-domain-downgrade # looks like --bypass-safety-checks is missing
Removing osd.2, timeout 1800s
Error: failed to remove disk: Failed to send request to target "micro03": cannot remove osd.2 we need at least 3 OSDs, have 3

root@micro01:~# microceph disk remove 2 --confirm-failure-domain-downgrade --bypass-safety-checks # Succeeds
Removing osd.2, timeout 1800s

root@micro01:~# microceph cluster remove micro03 # Still failing, no flags left besides --force
Error: Failed to execute pre-remove hook on cluster member "micro03": Need at least 3 mon, 1 mds, and 1 mgr besides micro03

Only when using microceph cluster remove <name> --force the cluster member can be removed.

We (cc @mseralessandri) observed this issue whilst trying to remove the third node from a three node MicroCloud deployment which currently does not work as this seems to be blocked by MicroCeph. The temporary workaround for now is to use the force flag with microcloud remove <name> --force.

Jul 18 '25 11:07 roosterfish

Thank you for reporting your feedback to us!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CEPH-1386.

This message was autogenerated

Jul 18 '25 11:07 syncronize-issues-to-jira[bot]

This was not caught in the MicroCloud test pipelines because for the specific test which goes from 3 to 2 MicroCloud nodes it is using the --force flag. The 2 to 1 node test works fine according to https://documentation.ubuntu.com/microcloud/latest/microcloud/how-to/remove_machine/#reducing-the-cluster-to-1-machine

Jul 18 '25 11:07 roosterfish

Update to this issue. This time I am trying to remove cluster member micro01.

Not sure if related to the same underlying issue, but when force removing the third member, reinstalling the microceph snap on this member and then trying to add it back to the cluster, I get the following error Error: System "micro01" failed to join the cluster: Failed to update cluster status of services: Failed to join "MicroCeph" cluster: failed to record mon db entries: failed to record mon host: This "config" entry already exists.

Running microceph.ceph mon remove micro01 beforehand succeeds but doesn't have an effect (not sure if it is the right command though).

So it looks like even when force removing there are still some traces of the removed cluster member inside the DB?

Aug 13 '25 15:08 roosterfish