udisks icon indicating copy to clipboard operation
udisks copied to clipboard

how to handle Btrfs multiple devices on the desktop

Open cmurf opened this issue 5 years ago • 15 comments

Background on this issue: https://gitlab.gnome.org/GNOME/gvfs/-/issues/519 https://bugs.kde.org/show_bug.cgi?id=427092

Nautilus and Dolphin show a disk icon for each Btrfs member device, and then much user and udisks confusion ensues. Desktop environment consumers may not need physical device information at all, and instead may be better off not being aware of it. When the user clicks on the various devices multiple times, multiple mount points are created, which is unintended but also confusing and not desired.

udisksdump.txt

Instead, they need a way to handle subvolumes, perhaps as virtual device 'children'. (This may not be entirely different from LVM thin pool or Stratis pool as the parent, and its filesystems as children - if this metaphor holds - except in the level of detail.)

Filing this bug to facilitate awareness of the competing issues.

Related #768 #88 libblockdev#244

cmurf avatar Sep 28 '20 20:09 cmurf

Instead, they need a way to handle subvolumes

We have a separate btrfs plugin with "advanced" btrfs functionality: http://storaged.org/doc/udisks2-api/latest/gdbus-org.freedesktop.UDisks2.Manager.BTRFS.html http://storaged.org/doc/udisks2-api/latest/gdbus-org.freedesktop.UDisks2.Filesystem.BTRFS.html

vojtechtrefny avatar Sep 29 '20 05:09 vojtechtrefny

We have a separate btrfs plugin with "advanced" btrfs functionality:

OK cool!

cmurf avatar Sep 29 '20 05:09 cmurf

@cmurf, can you please attach udevadm info --export-db too? I'm wondering whether there are any udev properties specific to btrfs multidisk volume.

tbzatek avatar Sep 29 '20 12:09 tbzatek

(This may not be entirely different from LVM thin pool or Stratis pool as the parent, and its filesystems as children - if this metaphor holds - except in the level of detail.)

For the record, the root cause of these issues is the fact that such btrfs multidisk volume members are detected as IdUsage: filesystem and thus displayed in the GUI and offered for mounting. This is a btrfs specific and creates confusion not only to upper local storage management layers, but possibly also to sysadmins working with CLI tools and not being fully aware of these specifics.

tbzatek avatar Sep 29 '20 12:09 tbzatek

UDev info for "multidisk" and "singledisk" volumes is the same. AFAICT only way how we can tell that two btrfs filesystems are part of the same volume is the same UUID.

$ udevadm info /dev/sde1                
P: /devices/pci0000:00/0000:00:07.0/host9/target9:0:1/9:0:1:0/block/sde/sde1
N: sde1
L: 0
S: disk/by-path/pci-0000:00:07.0-scsi-0:0:1:0-part1
S: disk/by-uuid/d986fd44-ec55-4744-b0c0-4306dcc97cb0
S: disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-1-part1
S: disk/by-partuuid/0ef75796-01
E: DEVPATH=/devices/pci0000:00/0000:00:07.0/host9/target9:0:1/9:0:1:0/block/sde/sde1
E: DEVNAME=/dev/sde1
E: DEVTYPE=partition
E: PARTN=1
E: MAJOR=8
E: MINOR=65
E: SUBSYSTEM=block
E: USEC_INITIALIZED=9525266
E: ID_SCSI=1
E: ID_VENDOR=QEMU
E: ID_VENDOR_ENC=QEMU\x20\x20\x20\x20
E: ID_MODEL=QEMU_HARDDISK
E: ID_MODEL_ENC=QEMU\x20HARDDISK\x20\x20\x20
E: ID_REVISION=2.5+
E: ID_TYPE=disk
E: ID_SERIAL=0QEMU_QEMU_HARDDISK_drive-scsi1-0-1
E: ID_SERIAL_SHORT=drive-scsi1-0-1
E: ID_BUS=scsi
E: ID_PATH=pci-0000:00:07.0-scsi-0:0:1:0
E: ID_PATH_TAG=pci-0000_00_07_0-scsi-0_0_1_0
E: ID_PART_TABLE_UUID=0ef75796
E: ID_PART_TABLE_TYPE=dos
E: ID_FS_UUID=d986fd44-ec55-4744-b0c0-4306dcc97cb0
E: ID_FS_UUID_ENC=d986fd44-ec55-4744-b0c0-4306dcc97cb0
E: ID_FS_UUID_SUB=9d86ffc7-a2d4-4b6a-8763-545efb08b295
E: ID_FS_UUID_SUB_ENC=9d86ffc7-a2d4-4b6a-8763-545efb08b295
E: ID_FS_TYPE=btrfs
E: ID_FS_USAGE=filesystem
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_UUID=0ef75796-01
E: ID_PART_ENTRY_TYPE=0x83
E: ID_PART_ENTRY_NUMBER=1
E: ID_PART_ENTRY_OFFSET=2048
E: ID_PART_ENTRY_SIZE=2095104
E: ID_PART_ENTRY_DISK=8:64
E: DM_MULTIPATH_DEVICE_PATH=0
E: ID_BTRFS_READY=1
E: DEVLINKS=/dev/disk/by-path/pci-0000:00:07.0-scsi-0:0:1:0-part1 /dev/disk/by-uuid/d986fd44-ec55-4744-b0c0-4306dcc97cb0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-0-1-part1 /dev/disk/by-partuuid/0ef75796-01
E: TAGS=:systemd:

vojtechtrefny avatar Sep 29 '20 13:09 vojtechtrefny

We can add some additional functions and/or properties to the btrfs plugin, but I don't see how we could add something helpful to the "core" UDisks API.

vojtechtrefny avatar Sep 29 '20 13:09 vojtechtrefny

It would be really helpful to have more information in UDev database. btrfs progs already ship a very simple UDev rule so adding a a btrfs filesystem show call to it and setting some btrfs-specific properties could be an option? @cmurf

vojtechtrefny avatar Sep 29 '20 14:09 vojtechtrefny

Opened https://github.com/kdave/btrfs-progs/issues/302 requesting at least some information published in the udev db. I believe such kind of information should be provided at the right place first as the local storage management is a layered model. Only then some upper layer like UDisks could make use of it with the benefit of all upper layers built on top of it.

tbzatek avatar Sep 29 '20 15:09 tbzatek

udevadminfo.txt

cmurf avatar Sep 29 '20 18:09 cmurf

I mentioned in gvfs#519 but forgot to mention here; seems that udisksd is being asked to mount by /dev node rather than by fs UUID. At least from the man page I don't see a way to reference fs UUID with udisksctl. The mount command can do it by label or uuid for any file system. I wonder if the most generic approach for mounting is to just always use label or uuid, no matter the file system.

Most interactions with btrfs file systems is mounting, and post-mount. The only thing that really needs to understand the details of all the devices is udisksd itself on behalf of a handful of sophisticated programs like partitioning agents. Maybe it'd be better if most of the time the majority of user agents are kept oblivious of the details, and just interact with either uuid/label and mount point?

cmurf avatar Oct 02 '20 19:10 cmurf

It's more complicated than that. Kernel and udev operates on major:minor block device nodes and /dev/disk/ symlinks are just different representations of the same object. Similarly any reference to a filesystem via LABEL= or UUID= resolves to a device node. The new kernel mount API could possibly take slightly different approach, however this needs to be reflected in libmount public API.

That said having duplicate filesystem identifiers present on different block devices is just wrong. Even for multipath a single device is created (btrfs over multipath anyone?). The "universally unique identifier (UUID)" is immediately not unique anymore, causing udev to randomly overwrite symlinks in /dev/disk/ that some libraries or tools do use. When matching against udev db, either a first or a random occurrence will get used, certainly not in an persistent order. That's where having more insight to a filesystem structure exposed to a udev db is crucial to solve first.

As a first step on UDisks side it will need to be made aware of duplicate filesystem identifiers and handle them gracefully to e.g. prevent multiple mounts, mount point cleanup conflicts, etc. Perhaps just taking first occurrence from a sorted list - reasonably stable within daemon lifespan. As described in https://gitlab.gnome.org/GNOME/gvfs/-/issues/519#note_921832. That will not fix the multiple object representation for the moment.

tbzatek avatar Oct 05 '20 15:10 tbzatek

The new kernel mount API could possibly take slightly different approach, however this needs to be reflected in libmount public API.

I was thinking of the clients, e.g. gvfs, file managers, open/save dialogs, udisksctl. Even GNOME Disks doesn't need to interact with literal block devices most of the time, such as when mounting the file system.

That said having duplicate filesystem identifiers present on different block devices is just wrong.

Why? It's the same for mdadm multiple devices:

/dev/vda3: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="18ebc747-9949-489c-f896-a47a9cdced7c" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="5ce570aa-cb25-4ee6-9f5c-3fc22d54b7af"
/dev/vdb1: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="a372c360-1157-e88c-a1ca-3c0be19f4ddf" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="cee7279a-7c63-4c6c-8c32-1194bd16e926"

RFC 4122 doesn't require a UUID exist only once, but that at the time of creation it must be unique. A collision only occurs if the same UUID is used for different referents, in both mdadm and Btrfs cases, there's one referent. The same UUID with different UUID_SUB seems to clearly indicate each unique individual constituent part of a whole.

In the mdadm case, udev seems to export udisks specific info.

E: UDISKS_MD_MEMBER_LEVEL=raid0
E: UDISKS_MD_MEMBER_DEVICES=2

Btrfs does have number of devices in each device's superblock. That's easy for udev to get and expose to udisks, if that's what's needed. Member devices aren't per se raid, that isn't how it works on Btrfs. Instead the 'raid level' is referred to as 'profile' and the profile applies per block group, and they can be different. This information isn't part of the superblock, but is stored in a btree.

As a first step on UDisks side it will need to be made aware of duplicate filesystem identifiers and handle them gracefully to e.g. prevent multiple mounts, mount point cleanup conflicts, etc.

Allowing multiple mounts of the file system is needed to support explicitly mounting subvolumes. Such a layout has been used by Fedora for ~10 years, and is used by default starting with Fedora 33, where subvol=home is mounted at /home, and subvol=root is mounted at /. It's effectively a bind mount, except that it's possible to path resolution without it first being visible.

The thing to probably avoid is mounting the same subvolume multiple times, but this is something of an artifact or side effect of multiple /dev nodes being exposed in the GUI rather than one filesystem volume icon. Each icon is currently a /dev node and we get a mount everytime the user clicks on one of the seemingly umounted ones, even though it is mounted. A related problem happens in GNOME Disks where it shows 1 of 3 Btrfs devices as mounted, the other two are not mounted, but they are all part of the same filesystem which is mounted.

cmurf avatar Oct 05 '20 21:10 cmurf

Why? It's the same for mdadm multiple devices:

/dev/vda3: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="18ebc747-9949-489c-f896-a47a9cdced7c" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="5ce570aa-cb25-4ee6-9f5c-3fc22d54b7af"
/dev/vdb1: UUID="05c30b48-4f9f-e3da-9489-5a6703287405" UUID_SUB="a372c360-1157-e88c-a1ca-3c0be19f4ddf" LABEL="localhost-live:root" TYPE="linux_raid_member" PARTUUID="cee7279a-7c63-4c6c-8c32-1194bd16e926"

RFC 4122 doesn't require a UUID exist only once, but that at the time of creation it must be unique. A collision only occurs if the same UUID is used for different referents, in both mdadm and Btrfs cases, there's one referent. The same UUID with different UUID_SUB seems to clearly indicate each unique individual constituent part of a whole.

Yes, however the mdraid components carry the ID_FS_USAGE=raid udev attribute (even for legacy mdraid superblock versions) in contrast to btrfs multidisk volumes that carry ID_FS_USAGE=filesystem. It's the combination of the filesystem usability flag and duplicate UUID that causes the problem.

In the mdadm case, udev seems to export udisks specific info.

E: UDISKS_MD_MEMBER_LEVEL=raid0
E: UDISKS_MD_MEMBER_DEVICES=2

These are own rules that we ship. The right place would be at the respective upstream projects and that's what kdave/btrfs-progs#302 should be about for btrfs (still need to follow up on that).

tbzatek avatar Jan 26 '21 14:01 tbzatek

Anyway, the basic support for multiple devices to avoid creating duplicate mounts is the #838 PR.

Let's deal with btrfs subvolumes in #768.

tbzatek avatar Jan 26 '21 14:01 tbzatek

It's the combination of the filesystem usability flag and duplicate UUID that causes the problem.

~~Would it help having ID_FS_USAGE=btrfs? Or does that just make things more complicated?~~ Nevermind, answered in btrfs-progs-302.

cmurf avatar Jan 26 '21 21:01 cmurf