microceph icon indicating copy to clipboard operation
microceph copied to clipboard

Unclear failure when enrolling OSD that was part of a previous deployment

Open gboutry opened this issue 10 months ago • 3 comments

User has reported failure to add disks to microceph on recent installation, after some more investigation, it appears the user did not wipe the disks before re-enrollment. The error is quite hard to decipher, I believe microceph could provide a more human readable error.

Logs

Error: Microceph Adding disks /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CTNA0T824204 failed: {'result': "[{'spec': '/dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CTNA0T824204', 'status': 'failure', 'message': 'Error: failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 41: exit status 250 (2025-02-12T15:30:21.681-0700 724b2cacf600 -1 bluestore(/var/lib/ceph/osd/ceph-41) _open_db_and_around failed to load os-type: (2) No such file or directory\n2025-02-12T15:30:21.681-0700 724b2cacf600 -1 bluestore(/var/lib/ceph/osd/ceph-41) mkfs fsck found fatal error: (2) No such file or directory\n2025-02-12T15:30:21.681-0700 724b2cacf600 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory\n2025-02-12T15:30:21.681-0700 724b2cacf600 -1 \x1b[0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-41: (2) No such file or directory\x1b[0m)\n'}]", 'return-code': 0}
Failed executing cmd: ['microceph', 'disk', 'add', '/dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CTNA0T824204'], error: Error: failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 40: exit status 250 (2025-02-12T15:19:52.940-0700 769ca8ece600 -1 bluestore(/var/lib/ceph/osd/ceph-40) _open_db_and_around failed to load os-type: (2) No such file or directory
2025-02-12T15:19:52.940-0700 769ca8ece600 -1 bluestore(/var/lib/ceph/osd/ceph-40) mkfs fsck found fatal error: (2) No such file or directory
2025-02-12T15:19:52.940-0700 769ca8ece600 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2025-02-12T15:19:52.940-0700 769ca8ece600 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-40: (2) No such file or directory)

Context

Version: microceph 19.2.0+snap2fbf0bad05 1271 squid/candidate canonical✓ held

Available disks from microceph pov:

+------------------+----------+------+------------------------------------------------------------+
|      MODEL       | CAPACITY | TYPE |                            PATH                            |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 1.46TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CTNA0T824203 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 1.46TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CTNA0T824204 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 3.49TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CPNA0T710510 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 3.49TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CPNA0T710511 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 3.49TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CPNA0T710512 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 3.49TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CPNA0T710513 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 3.49TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CPNA0T710514 |
+------------------+----------+------+------------------------------------------------------------+
| Dell Ent NVMe v2 | 3.49TiB  | scsi | /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CPNA0T710515 |
+------------------+----------+------+------------------------------------------------------------+
$ stat -c "%F" /dev/disk/by-id/scsi-SNVMe_Dell_Ent_NVMe_v2_S6CTNA0T824203
symbolic link
$ stat -c "%F" /dev/sdb
block special file
$ stat -c "%F" /dev/sda
block special file

Additional notes

User has reported this setup as functional on previous deployment.

gboutry avatar Feb 12 '25 23:02 gboutry

Thank you for reporting your feedback to us!

The internal ticket has been created: https://warthogs.atlassian.net/browse/CEPH-1173.

This message was autogenerated

This is similar to #507 and would be resolved by a better warning/ error structure.

UtkarshBhatthere avatar Feb 14 '25 09:02 UtkarshBhatthere

@UtkarshBhatthere any news?

gboutry avatar Jul 01 '25 14:07 gboutry