bcachefs icon indicating copy to clipboard operation
bcachefs copied to clipboard

Identifying devices

Open andrewbaxter opened this issue 1 year ago • 6 comments

I'm testing with some loopback devices (2 devices, replicas=2, metadata_replicas_required=2, data_replicas_required=2 (what does the last one do? no documentation in the manpage)). A device failed (losetup -d), and now the array won't mount (probably https://github.com/koverstreet/bcachefs/issues/703).

This isn't about mounting specifically though, my question is more about finding which disk corresponds to which label, etc. Basically, assuming I could mount, how do I figure out which disk is missing and therefore which disk I need to remove?

I'm mounting via UUID, I don't expect device nodes to be stable, so I don't have a list of device nodes for the mount post-format.

  • bcachefs show-super lists devices, but not the device path (I guess it's only looking at superblock info and not scanning devices). The only identifying information is the label (text label + device index?) and the UUID. AFAICT none of these are externally visible, so the device UUID doesn't appear in /dev/disk/*
  • When mounting the identified disks are listed in dmesg by device node (/dev/loop2) but that doesn't seem useful for removing: it could provide a list of present devices but the device node that bcachefs would need to remove the disk can't be determined (it may have been shadowed during this boot). Similarly, with just the device node I can't connect it to labels/UUIDs in the show-super output so I can't determine the missing labels by elimination either.
  • AFAICT there are no other commands for showing information about a disk, that I could e.g. run on all disks to come up with a mapping

Offhand, is there a guide for expected procedures for recovering from a failure? The procedure is unclear at this point and I see lots of SO posts/threads about recovery but with no clear resolution or answers.

andrewbaxter avatar Dec 15 '24 07:12 andrewbaxter

Ah! It looks like the device UUID is available in udev:

ID_FS_UUID_SUB=2ecde329-a682-433d-88a3-68e164347d07
ID_FS_UUID_SUB_ENC=2ecde329-a682-433d-88a3-68e164347d07

It's not linked on my system - is that used udev rules on common distros?

andrewbaxter avatar Dec 15 '24 07:12 andrewbaxter

I see it referenced in 69-md-clustered-confirm-device.rules:

PROGRAM="/usr/bin/blkid -o device -t UUID_SUB=$env{DEVICE_UUID}", ENV{.md.newdevice} = "$result"

ENV{.md.newdevice}!="", RUN+="/usr/bin/mdadm --manage $env{DEVNAME} --cluster-confirm $env{RAID_DISK}:$env{.md.newdevice}"
ENV{.md.newdevice}=="", RUN+="/usr/bin/mdadm --manage $env{DEVNAME} --cluster-confirm $env{RAID_DISK}:missing"

Unfortunately that's pretty specific and doesn't link anything.

Edit: I tried

$ cat 61-disk-uuid-sub.rules 
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_SUB_ENC}=="?*", SYMLINK+="disk/by-uuid-sub/$env{ID_FS_UUID_SUB_ENC}"

and that seemed to work fine. Then you can correlate device nodes and show-super device output, identify the present nodes, and then remove the missing nodes.

andrewbaxter avatar Dec 15 '24 07:12 andrewbaxter

So I guess in summary:

  1. Is this the expected way of identifying drives?
  2. Could these be proposed as recommended udev rules?

andrewbaxter avatar Dec 15 '24 08:12 andrewbaxter

You can find relation dev-indx with dev name in sys fs:

# bcachefs show-super /dev/sda |grep '^Device:'
...
Device:                                    0
Device:                                    1
Device:                                    3

# ls -l /sys/fs/bcachefs/647f0af5*/dev-?/block
lrwxrwxrwx 1 root root 0 Dec 19 14:07 /sys/fs/bcachefs/647f0af5-81b2-4497-b829-382730d87b2c/dev-0/block -> ../../../../devices/pci0000:00/0000:00:11.0/ata2/host1/target1:0:0/1:0:0:0/block/sdc
lrwxrwxrwx 1 root root 0 Dec 19 14:07 /sys/fs/bcachefs/647f0af5-81b2-4497-b829-382730d87b2c/dev-1/block -> ../../../../devices/pci0000:00/0000:00:11.0/ata1/host0/target0:1:0/0:1:0:0/block/sda
lrwxrwxrwx 1 root root 0 Dec 19 14:07 /sys/fs/bcachefs/647f0af5-81b2-4497-b829-382730d87b2c/dev-3/block -> ../../../../devices/pci0000:00/0000:00:11.0/ata1/host0/target0:2:0/0:2:0:0/block/sdb

alexminder avatar Dec 19 '24 14:12 alexminder

Ah cool, and I guess the missing devices wouldn't have entries in sys-fs? That's easier than setting up udev rules.

andrewbaxter avatar Dec 19 '24 17:12 andrewbaxter

Ah! The dev-N exists but block is missing if the device doesn't exist. That means to remove dead devices I just have to scan the sysfs dir.

I think identifying (missing, but also existing devices) is critical to day to day bcachefs management, so it would be good to document how this can be done (either via sysfs or new udev rules). The current user documentation doesn't provide any guidance on the matter, and it describes sysfs as having

various options, performance counters and internal debugging aids.

so it doesn't seem to be intended for use as a primary interface. There's no mention of the block node or its behavior.

andrewbaxter avatar Feb 01 '25 06:02 andrewbaxter