loop-util: probe sector size harder
If the backing device of regular files for RootImage= has a sector size of 4k, the default value of 512 (used in absence of a partition table) won't work. Let's probe the sector size harder by looking up the backing device.
An -rc1 tag has been created and a release is being prepared, so please note that PRs introducing new features and APIs will be held back until the new version has been released.
May fix #25792 in some cases.
[ 7.087907] H systemd[1]: Starting minimal-app0-foo.service...
[ 7.087950] H (cat)[454]: Opened '/usr/share/minimal_0.raw' in O_RDWR access mode, with O_DIRECT enabled.
[ 7.087993] H (cat)[454]: Couldn't find any partition table to derive sector size of.
[ 7.088195] H (cat)[454]: Failed to create loop device for root image: Operation not permitted
[ 7.088272] H (cat)[454]: minimal-app0-foo.service: Failed to set up mount namespacing: Operation not permitted
[ 7.088322] H (cat)[454]: minimal-app0-foo.service: Failed at step NAMESPACE spawning cat: Operation not permitted
CI still failed the same way. Added some debugging, let's see what's the problem.
Jul 20 17:21:53 H (cat)[452]: Opened '/usr/share/minimal_0.raw' in O_RDWR access mode, with O_DIRECT enabled.
Jul 20 17:21:53 H (cat)[452]: Initial sector_size 4294967295
Jul 20 17:21:53 H (cat)[452]: Couldn't find any partition table to derive sector size of.
Jul 20 17:21:53 H (cat)[452]: Not blockdev, probe sector_size 512, ret 0
Jul 20 17:21:53 H (cat)[452]: Backing dev 8:1
Jul 20 17:21:53 H (cat)[452]: Failed to create loop device for root image: Operation not permitted
Jul 20 17:21:53 H (cat)[452]: minimal-app0-foo.service: Failed to set up mount namespacing: Operation not permitted
Jul 20 17:21:53 H (cat)[452]: minimal-app0-foo.service: Failed at step NAMESPACE spawning cat: Operation not permitted
Opening the block device is probably not allowed. In that case, let's continue without the sector size info. I don't think there is then any unprivileged method get the sector size.
Though there's statvfs(), which gives the block size even to unprivileged process.
New version with statvfs(). At least on my system it gives the right size of 4k.
This time TEST-29-PORTABLE is good, but TEST-13-NSPAWN fails:
Jul 20 19:15:14 H kernel: loop0: detected capacity change from 0 to 131072
Jul 20 19:15:14 H kernel: block loop0: the capability attribute has been deprecated.
Jul 20 19:15:14 H systemd-nspawn[889]: Failed to mount image: Invalid argument
Jul 20 19:15:14 H systemd-nspawn[888]: Failed to receive mount namespace fd from outer child: Input/output error
Jul 20 19:15:14 H systemd[1]: [email protected]: Got notification message from PID 888 (STOPPING=1, STATUS=Terminating...)
Jul 20 19:15:14 H systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/systemd_2dnspawn_40container_2draw_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3793 reply_cookie=0 signature=sa{sv>
Jul 20 19:15:14 H systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/systemd_2dnspawn_40container_2draw_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3794 reply_cookie=0 signature=sa{sv>
Jul 20 19:15:14 H kernel: EXT4-fs (loop0): bad block size 1024
1024 doesn't seem valid block size for direct I/O even if it's the block size for ext4. But there are no debug messages I inserted. Could it be unrelated? Or are just debug level messages missing?
@topimiettinen What's the status here? I don't think the fstatvfs() block size will always match the sector size. e.g. we create ext4 with block size 4096 in repart even if we the sector size is 512.
If the backing device of regular files for RootImage= has a sector size of 4k, the default value of 512 (used in absence of a partition table) won't work. Let's probe the sector size harder by looking up the backing device.
Why wouldn't that work? file systems are byte-addressable. Hence you can have a loopback device off it of any sector size. If the sector size of the loop device is smaller than the backing device you might not get direct IO to work on the device, but that's an opimization.
@topimiettinen What's the status here? I don't think the
fstatvfs()block size will always match the sector size. e.g. we create ext4 with block size 4096 in repart even if we the sector size is 512.
I'm not sure what's the right approach. Perhaps the size accepted for direct I/O should be probed by first trying 4k, then 2k and so on (or straight to 512b after 4k).
If the backing device of regular files for RootImage= has a sector size of 4k, the default value of 512 (used in absence of a partition table) won't work. Let's probe the sector size harder by looking up the backing device.
Why wouldn't that work? file systems are byte-addressable. Hence you can have a loopback device off it of any sector size. If the sector size of the loop device is smaller than the backing device you might not get direct IO to work on the device, but that's an opimization.
If it's just an optimization, mounting the loop device shouldn't ever fail. There should be fallback logic everywhere, retrying with different direct I/O unit sizes, or without direct I/O.
I am not sure I follow here. Are you suggesting you have a disk image on some Linux fs, that you cannot allocate a loopback device for with the native sector size of the image?
I've ran into the related issue #25792 before. Say you have /home backed by a disk that uses 4K sector sizes. You want to use systemd-homed to create a new user with a LUKS encrypted volume. That volume creation runs into issues because the sector size between The loopback device, the disk image that hosts the user's home directory and the underlying block device don't agree on the sector size.
The quick solution? Set the sector size when creating the user. The better solution? Probe harder or at least suggest a fix to the user.
As always, I appreciate the effort everyone puts into these tools and I rely on them every day.
EDIT: The above no longer seems to be an issue in systemd 256.
I've ran into the related issue #25792 before. Say you have
/homebacked by a disk that uses 4K sector sizes. You want to usesystemd-homedto create a new user with a LUKS encrypted volume. That volume creation runs into issues because the sector size between The loopback device, the disk image that hosts the user's home directory and the underlying block device don't agree on the sector size.
All regular files on Linux file systems are byte-addressable. Hence the sector size used by the underliying fs doesn't really matter for general support, it only matters for perfomanc (i.e. enabling direct io mode). And that's why I don't get what this is supposed to be about.
or to say this differently: i can create either a 512 or a 4K sector size loopback block device of a file on any Linux fs. This always works, regardless of the backing sector size of the fs.
What does matter though is that the sector size used for the upper fs matches the sector size of the loopback device. To make that work correctly automatically we nowadays automatically look for the GPT disk label at the 512, 1K, 2K, 4K sector offsets of the disk image files, before we set up a loopback device on it, and then use what we found as sector size.
Hence, I am not following what the issue is supposed to be here.
@poettering Whatever the issue was that I saw previously is no longer the case when I've just tested now on Ubuntu 24.10 (systemd 256.5-2ubuntu3).
@topimiettinen can you enlighten us what this is about, then?
maybe just some spurious kernel bug that ubuntu has fixed by now?
@poettering the issue is with 4k hard sector size of the underlying disk device, which is unrelated to file system block size. Anyway, this PR doesn't seem useful, so closing.