e2fsprogs RFE: Please consider (optionally) taking BSD file lock on block devices being called on, to avoid races against udev superblock probing

Since a long time udev has been implementing a simple protocol to ensure that invocations of mkfs/mkswap/fdisk tools don't race against udev's device superblock/partition table probing. See this:

https://systemd.io/BLOCK_DEVICE_LOCKING

I'd like to see mke2fs to implement this natively (it's trivial, just optionally do an flock(dev_fd, LOCK_EX) after open()ing the device). This behaviour should be controllable via an env var ideally, and maybe via cmdline argument, and default to off for compat reasons.

I filed a similar bug against util-linux (regarding fdisk and mkswap) here: https://github.com/karelzak/util-linux/issues/921

Why bother with this in mke2fs? Strictly speaking it's not necessary, people can work around this by locking the bock device outside of the tool and invoking mke2fs with the lock already taken. However, I think it's a good thing to let mke2fs do this natively to make it easier for people, require less glue code around it and make tools such as "lslocks" more descriptive. Ideally all the various tools modifying/creating a superblock/partition table, such as mkfs/mkswap/fdisk/… would implement the same logic here, which is why I am filing these bugs.

if you look at how tools such as gparted try to work around the fact that so far mkswap/mkfs/fdisk/… (randomly masking a multitude of system services it maintains in a blacklist that might react too quickly to half-written superblocks) have no concept of locking whatsoever you can get very sad, this is an attempt to move everybody to just use Linux own concept for advisory locks on its own objects to make everybody a good citizen when dealing with block devices and not stepping on each other's toes all the time...

And yes, implementing this will also fix real bugs. Right now in the impementation of the "tmp" option of /etc/crypttab in systemd, we optionally invoke mke2fs after we set up the device. This is currently invoked as it is, racing against udev's device probing, so that udev racily misses the superblock being created. If mke2fs could automatically lock the block device we can make sure that udev's probing will be delayed or restarted after mke2fs did its job and things would just work. And yes, as mentioned we could solve this via other methods but if we'd properly synchronize here we would certainly solve this in the cleanest possible way.

If this is acceptable I might even prep a patch if need be... Please let me know.

Jan 03 '20 09:01 poettering

(In case there are concerns that this is a systemd-specific thing: yes, systemd-udev is the main project this is relevant for, but just doing flock() is entirely generic and a first class Linux API for locking objects that have fds, hence I'd claim it's the right thing to do anyway.)

Jan 03 '20 10:01 poettering

How do you prevent udev from trying to probe the device before mke2fs has a chance to run (and take the BSD file lock on the block device)? It doesn't seem to be a complete solution.

And if the answer is you inhibit udev until the mkfs program has a chance to start, why not wait until mkfs completes? I would think you would want to collect the exit status from the mkfs program, so you can report a failure to initialize the file system?

Jan 03 '20 15:01 tytso

I responded on https://github.com/karelzak/util-linux/issues/921, let's continue the discussion exclusively there maybe.

Jan 03 '20 16:01 poettering

This patch may fix the problem, please review it. https://patchwork.ozlabs.org/project/linux-ext4/patch/[email protected]/

Oct 15 '22 08:10 hifilove

@hifilove that patch doesn't look right to me. The BSD lock is suppsoed to be taken of the main block device, not the partition device. Usually you'll call mke2fs on a partition block device though, hence this is the wrong device to make udev stay off it.

In the BSD lock logic udev implements there's no concept of locking a single partition device only. You can only lock the whole device.

Oct 16 '22 11:10 poettering

@poettering Hi Poettering. First, if the master device is locked during mkfs, will the operation of other partitions be abnormal? Secondly, I don't see any modification of linux-utils to acquire the lock of the master device: https://github.com/util-linux/util-linux/issues/921

I have an idea, whether the lock of the main device can be checked before the detection of the lock of the sub-device is performed in systemd.

Oct 17 '22 03:10 hifilove

First, if the master device is locked during mkfs, will the operation of other partitions be abnormal?

This will cause the two partitions to not be formatted in parallel, which seems odd.

Oct 17 '22 07:10 hifilove

@poettering So, why not let udev to lock all the partition block device? It seems kind of unsuitable to let a partition operation keep the BSD lock of the main block device.

Oct 17 '22 13:10 jiayi0118

@poettering So, why not let udev to lock all the partition block device? It seems kind of unsuitable to let a partition operation keep the BSD lock of the main block device.

I thought the same thing.

Oct 17 '22 14:10 hifilove

As mentioned elsewhere, the udev logic blocks out the whole device since when probing a partition fs we tend to also look in the whole block device to acquire partition uuid/label, not just fs uuid/label. Hence it makes sense to lock the whole thing.

Note that the lock shouldn't be contended, at least I assume that repartitioning and mkfs is not something you'd do in a busy loop. Because of that a finegrained lock doesn't really get you much, if you have to larger lock anyway. Moreover, it creates a mess of ABBA problems, I'd rather not bother with this.

The behaviour of udev and the documentation for it, always said it's about the whole block device, not the partition block device, hence if the mkfs wrapper in util-linux locks only the partition device then I'd consider that a bug, or at least not compatible with what udev does.

Oct 18 '22 09:10 poettering

e2fsprogs e2fsprogs copied to clipboard

RFE: Please consider (optionally) taking BSD file lock on block devices being called on, to avoid races against udev superblock probing

e2fsprogs
e2fsprogs copied to clipboard