Inconsistencies on `/dev/disk/by-id/*` when using similar long device names and multipath
Required information
- Distribution: Ubuntu
- Distribution version: 22.04
- snap: latest/edge
- LXC version: 5.21.1 LTS
- LXD version: 5.21.1 LTS
Issue description
When using two block devices with names that share a prefix of 16 characters (after escaping) and also using multipath, the /dev/disk/by-id/* symlinks to those devices are identical resulting in a single symlink being created. The second device would now be accessed through /dev/dm-0 and not /dev/sdc. Although the latter exists, making a filesystem on it fails and returns /dev/sdc is apparently in use by the system; will not make a filesystem here!. That problem only occurs when multipath-tools is installed and removing it and restarting the VM fixes this.
Note that the second device is still usable, but through /dev/dm-0. So this may not be critical but it should be worth some discussion to better understand what is happening.
Steps to reproduce
lxc launch ubuntu:n vm --vm # Ubuntu images come with multipath-tools installedlongName="long-device-name"lxc storage volume create default vol1 --type=blocklxc storage volume create default vol2 --type=blocksleep 30lxc exec vm -- systemctl is-system-running --waitlxc config device add vm "${longName}1" disk pool=default source=vol1lxc config device add vm "${longName}2" disk pool=default source=vol2lxc exec vm -- mkfs.ext4 /dev/sdc # Failslxc exec vm -- apt autopurge multipath-toolslxc restart vmlxc exec vm -- mkfs.ext4 /dev/sdc # Succeeds
Please correct me if I am wrong @simondeziel. This seems to be happening because when using a long block device name, the /dev/disk/by-id only includes the first 16 characters of the device name (after escaping).
For example, using long-device-name1 as block device name would result in its path on /dev/disk/by-id to look like 0QEMU_QEMU_HARDDISK_lxd_long--device--na, and the same would happen for a second added device named long-device-name2, overwriting the first /dev/disk/by-id link and leaving the first device without a link on /dev/disk/by-id.
This could be making multipath-tools assume both devices are the same and creating the inconsistencies described above.
@hamistao I dont think there is much we can do about this I'm afraid.
@tomponline I agree, I don't think we can't fix this without changing /dev/disk/by-id paths. I wanted to open this in case someone else could think of a solution.
If this is a limitation of udev there isn't much we can do.
@hamistao as you described, I think that due the /dev/disk/by-id symlinks ending up as the same, the 2nd disk sharing the name prefix is the one taking over the by-id symlink. This in turn seems to hint multipath into thinking there are many paths leading to the same disk.
I took a look at this, and agree with the findings. The udev rules are shortening the WWID so you see the following:
root@n-vm:~# ll /dev/disk/by-id | grep device--na
lrwxrwxrwx 1 root root 9 Jul 16 23:21 scsi-0QEMU_QEMU_HARDDISK_lxd_long--device--na -> ../../sdc
lrwxrwxrwx 1 root root 9 Jul 16 23:20 scsi-SQEMU_QEMU_HARDDISK_lxd_long--device--name1 -> ../../sdb
lrwxrwxrwx 1 root root 9 Jul 16 23:20 scsi-SQEMU_QEMU_HARDDISK_lxd_long--device--name2 -> ../../sdc
FWIW, in this scenario, you know what to expect for the prefix. Therefore you can blacklist the WWID with multipath-tools, and avoid removing multipath-tools.
Add the following to /etc/multipath.conf
blacklist {
wwid 0QEMU_QEMU_HARDDISK_lxd_long--device--na
}
And restart multipath-tools
systemctl restart multipath-tools
And then you will see your block devices are not multipathd. This will persist across reboots.
root@n-vm:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 10G 0 disk
├─sda1 8:1 0 9G 0 part /
├─sda14 8:14 0 4M 0 part
├─sda15 8:15 0 106M 0 part /boot/efi
└─sda16 259:0 0 913M 0 part /boot
sdb 8:16 0 10G 0 disk
sdc 8:32 0 10G 0 disk
@hamistao @simondeziel is there anything to do on this issue or can it be closed?
Let's close this bug as LXD has no real way to workaround this. The operator has to know to use different device prefixes if multipathd is to be used inside the instance.