systemd icon indicating copy to clipboard operation
systemd copied to clipboard

loop-util: probe sector size harder

Open topimiettinen opened this issue 2 years ago • 14 comments

If the backing device of regular files for RootImage= has a sector size of 4k, the default value of 512 (used in absence of a partition table) won't work. Let's probe the sector size harder by looking up the backing device.

topimiettinen avatar Jul 19 '23 09:07 topimiettinen

An -rc1 tag has been created and a release is being prepared, so please note that PRs introducing new features and APIs will be held back until the new version has been released.

github-actions[bot] avatar Jul 19 '23 09:07 github-actions[bot]

May fix #25792 in some cases.

topimiettinen avatar Jul 19 '23 09:07 topimiettinen

[    7.087907] H systemd[1]: Starting minimal-app0-foo.service...
[    7.087950] H (cat)[454]: Opened '/usr/share/minimal_0.raw' in O_RDWR access mode, with O_DIRECT enabled.
[    7.087993] H (cat)[454]: Couldn't find any partition table to derive sector size of.
[    7.088195] H (cat)[454]: Failed to create loop device for root image: Operation not permitted
[    7.088272] H (cat)[454]: minimal-app0-foo.service: Failed to set up mount namespacing: Operation not permitted
[    7.088322] H (cat)[454]: minimal-app0-foo.service: Failed at step NAMESPACE spawning cat: Operation not permitted

mrc0mmand avatar Jul 19 '23 12:07 mrc0mmand

CI still failed the same way. Added some debugging, let's see what's the problem.

topimiettinen avatar Jul 20 '23 14:07 topimiettinen

Jul 20 17:21:53 H (cat)[452]: Opened '/usr/share/minimal_0.raw' in O_RDWR access mode, with O_DIRECT enabled.
Jul 20 17:21:53 H (cat)[452]: Initial sector_size 4294967295
Jul 20 17:21:53 H (cat)[452]: Couldn't find any partition table to derive sector size of.
Jul 20 17:21:53 H (cat)[452]: Not blockdev, probe sector_size 512, ret 0
Jul 20 17:21:53 H (cat)[452]: Backing dev 8:1
Jul 20 17:21:53 H (cat)[452]: Failed to create loop device for root image: Operation not permitted
Jul 20 17:21:53 H (cat)[452]: minimal-app0-foo.service: Failed to set up mount namespacing: Operation not permitted
Jul 20 17:21:53 H (cat)[452]: minimal-app0-foo.service: Failed at step NAMESPACE spawning cat: Operation not permitted

Opening the block device is probably not allowed. In that case, let's continue without the sector size info. I don't think there is then any unprivileged method get the sector size.

topimiettinen avatar Jul 20 '23 15:07 topimiettinen

Though there's statvfs(), which gives the block size even to unprivileged process.

topimiettinen avatar Jul 20 '23 15:07 topimiettinen

New version with statvfs(). At least on my system it gives the right size of 4k.

topimiettinen avatar Jul 20 '23 15:07 topimiettinen

This time TEST-29-PORTABLE is good, but TEST-13-NSPAWN fails:

Jul 20 19:15:14 H kernel: loop0: detected capacity change from 0 to 131072
Jul 20 19:15:14 H kernel: block loop0: the capability attribute has been deprecated.
Jul 20 19:15:14 H systemd-nspawn[889]: Failed to mount image: Invalid argument
Jul 20 19:15:14 H systemd-nspawn[888]: Failed to receive mount namespace fd from outer child: Input/output error
Jul 20 19:15:14 H systemd[1]: [email protected]: Got notification message from PID 888 (STOPPING=1, STATUS=Terminating...)
Jul 20 19:15:14 H systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/systemd_2dnspawn_40container_2draw_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3793 reply_cookie=0 signature=sa{sv>
Jul 20 19:15:14 H systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1/unit/systemd_2dnspawn_40container_2draw_2eservice interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=3794 reply_cookie=0 signature=sa{sv>
Jul 20 19:15:14 H kernel: EXT4-fs (loop0): bad block size 1024

1024 doesn't seem valid block size for direct I/O even if it's the block size for ext4. But there are no debug messages I inserted. Could it be unrelated? Or are just debug level messages missing?

topimiettinen avatar Jul 20 '23 18:07 topimiettinen

@topimiettinen What's the status here? I don't think the fstatvfs() block size will always match the sector size. e.g. we create ext4 with block size 4096 in repart even if we the sector size is 512.

DaanDeMeyer avatar Aug 15 '23 07:08 DaanDeMeyer

If the backing device of regular files for RootImage= has a sector size of 4k, the default value of 512 (used in absence of a partition table) won't work. Let's probe the sector size harder by looking up the backing device.

Why wouldn't that work? file systems are byte-addressable. Hence you can have a loopback device off it of any sector size. If the sector size of the loop device is smaller than the backing device you might not get direct IO to work on the device, but that's an opimization.

poettering avatar Aug 15 '23 11:08 poettering

@topimiettinen What's the status here? I don't think the fstatvfs() block size will always match the sector size. e.g. we create ext4 with block size 4096 in repart even if we the sector size is 512.

I'm not sure what's the right approach. Perhaps the size accepted for direct I/O should be probed by first trying 4k, then 2k and so on (or straight to 512b after 4k).

topimiettinen avatar Aug 15 '23 15:08 topimiettinen

If the backing device of regular files for RootImage= has a sector size of 4k, the default value of 512 (used in absence of a partition table) won't work. Let's probe the sector size harder by looking up the backing device.

Why wouldn't that work? file systems are byte-addressable. Hence you can have a loopback device off it of any sector size. If the sector size of the loop device is smaller than the backing device you might not get direct IO to work on the device, but that's an opimization.

If it's just an optimization, mounting the loop device shouldn't ever fail. There should be fallback logic everywhere, retrying with different direct I/O unit sizes, or without direct I/O.

topimiettinen avatar Aug 15 '23 15:08 topimiettinen

I am not sure I follow here. Are you suggesting you have a disk image on some Linux fs, that you cannot allocate a loopback device for with the native sector size of the image?

poettering avatar Oct 18 '24 20:10 poettering

I've ran into the related issue #25792 before. Say you have /home backed by a disk that uses 4K sector sizes. You want to use systemd-homed to create a new user with a LUKS encrypted volume. That volume creation runs into issues because the sector size between The loopback device, the disk image that hosts the user's home directory and the underlying block device don't agree on the sector size.

The quick solution? Set the sector size when creating the user. The better solution? Probe harder or at least suggest a fix to the user.

As always, I appreciate the effort everyone puts into these tools and I rely on them every day.

EDIT: The above no longer seems to be an issue in systemd 256.

mcassaniti avatar Oct 19 '24 05:10 mcassaniti

I've ran into the related issue #25792 before. Say you have /home backed by a disk that uses 4K sector sizes. You want to use systemd-homed to create a new user with a LUKS encrypted volume. That volume creation runs into issues because the sector size between The loopback device, the disk image that hosts the user's home directory and the underlying block device don't agree on the sector size.

All regular files on Linux file systems are byte-addressable. Hence the sector size used by the underliying fs doesn't really matter for general support, it only matters for perfomanc (i.e. enabling direct io mode). And that's why I don't get what this is supposed to be about.

or to say this differently: i can create either a 512 or a 4K sector size loopback block device of a file on any Linux fs. This always works, regardless of the backing sector size of the fs.

What does matter though is that the sector size used for the upper fs matches the sector size of the loopback device. To make that work correctly automatically we nowadays automatically look for the GPT disk label at the 512, 1K, 2K, 4K sector offsets of the disk image files, before we set up a loopback device on it, and then use what we found as sector size.

Hence, I am not following what the issue is supposed to be here.

poettering avatar Oct 28 '24 10:10 poettering

@poettering Whatever the issue was that I saw previously is no longer the case when I've just tested now on Ubuntu 24.10 (systemd 256.5-2ubuntu3).

mcassaniti avatar Oct 28 '24 21:10 mcassaniti

@topimiettinen can you enlighten us what this is about, then?

maybe just some spurious kernel bug that ubuntu has fixed by now?

poettering avatar Oct 30 '24 17:10 poettering

@poettering the issue is with 4k hard sector size of the underlying disk device, which is unrelated to file system block size. Anyway, this PR doesn't seem useful, so closing.

topimiettinen avatar Nov 01 '24 19:11 topimiettinen