mkosi icon indicating copy to clipboard operation
mkosi copied to clipboard

mkosi is over-eager to use "cp --preserve=xattr", fails in cross-filesystem cases

Open gdonval opened this issue 6 months ago • 26 comments

mkosi commit the issue has been seen with

main

Used host distribution

archlinux

Used target distribution

ubuntu

Linux kernel version used

6.14.6-arch1-1

CPU architectures issue was seen on

x86_64

Unexpected behaviour you saw

WorkspaceDirectory set to XFS-supported directory (here /tmp/mkosi.workspace). All other mkosi dirs on BTRFS-supported directory.

...
‣ Copying repository metadata
cp: setting attributes for '/work/tmp/mkosi-gdonval-workspace/mkosi/mkosi-metadata-lwbvcpek/cache/apt/srcpkgcache.bin': Operation not supported
...
cp: setting attributes for '/work/tmp/mkosi-gdonval-workspace/mkosi/mkosi-metadata-lwbvcpek/cache/apt': Operation not supported
‣ "cp --recursive --no-dereference --preserve=mode,links,timestamps,ownership,xattr --reflink=auto --copy-contents '/work/home/gdonval/.cache/mkosi/ubuntu~noble~x86-64/cache/apt' /work/tmp/mkosi-gdonval-workspace/mkosi/mkosi-metadata-lwbvcpek/cache/apt --no-target-directory" returned non-zero exit code 1.

The reason it happens it because original files have attrs that cannot be copied, e.g:

$ lsattr -a /home/gdonval/Documents/Work/RVMI/Services/UbuntuRVMI/mkosi.cache/ubuntu~noble~x86-64~main.metadata.cache/cache/apt/srcpkgcache.bin
--------c------------- /home/gdonval/Documents/Work/RVMI/Services/UbuntuRVMI/mkosi.cache/ubuntu~noble~x86-64~main.metadata.cache/cache/apt/srcpkgcache.bin
$ lsattr -a /tmp/mkosi-gdonval-workspace/mkosi/mkosi-metadata-lwbvcpek/cache/apt/srcpkgcache.bin
---------------------- /tmp/mkosi-gdonval-workspace/mkosi/mkosi-metadata-lwbvcpek/cache/apt/srcpkgcache.bin

That c is invalid on XFS.

Used mkosi config

[Distribution]
Distribution=ubuntu
Release=noble
Repositories=universe multiverse

[Build]
WorkspaceDirectory=/tmp/mkosi.workspace

[Output]
Format=disk

mkosi output


gdonval avatar May 22 '25 14:05 gdonval

I've had this alternative to #3692 flying around for a while. Would this work for you?

behrmann avatar May 22 '25 15:05 behrmann

BTW @gdonval, @behrmann, this copy issue doesn't seem to be XFS-specific, cp ... --preserve=xattr fails on a cross-device copy between two BTRFS filesystems (/ and a LUKS-device-backed BTRFS homed) when running latest main as root on my system.

I suspect I'd be able to work around the issue with MKOSI__CP_PRESERVE_ATTR from @behrmann's branch, but it seems like this is just broken, no?

cat <<'EOF' > mkosi.conf
[Distribution]
Distribution=arch
[Output]
Format=directory
ImageId=test
[Content]
Packages=systemd
EOF

cat <<'EOF' > mkosi.postinst
#!/usr/bin/bash
lsattr -d $BUILDROOT/var/log/journal
EOF

chmod a+x mkosi.postinst

sudo mkosi  # Note: does not fail if not running as root
# ...
# ‣  Running postinstall script /home/nick/tmp/mkosi.postinst…
# ---------------C------ /buildroot/var/log/journal
# ...
# ‣ Could not rename /var/tmp/mkosi-workspace-m5t2e24a/staging/test to /home/nick/tmp/test as they are located on different devices, falling back to copying
ERROR: Could not create subvolume: Invalid cross-device link
cp: setting attributes for '/work/home/nick/tmp/test/var/log/journal': Operation not supported
# ‣ "cp --recursive --no-dereference --preserve=mode,links,timestamps,ownership,xattr --reflink=auto --copy-contents /work/var/tmp/mkosi-workspace-m5t2e24a/staging/test /work/home/nick/tmp/test --no-target-directory" returned non-zero exit code 1.

labichn avatar May 22 '25 18:05 labichn

Do you know which combinations don't work? xfs <-> btrfs, btrfs <-> btrfs

behrmann avatar May 22 '25 20:05 behrmann

@gdonval @labichn This is with cp from coreutils right? I cloned the codebase and can't figure out for the life of me where they handle file attributes at all.

DaanDeMeyer avatar May 27 '25 11:05 DaanDeMeyer

If it is from coreutils, can I get the exact coreutils version?

DaanDeMeyer avatar May 27 '25 11:05 DaanDeMeyer

$ cp --version
cp (GNU coreutils) 9.7
Copyright (C) 2025 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjörn Granlund, David MacKenzie, and Jim Meyering.

I'll try to create a reproducer.

gdonval avatar May 27 '25 12:05 gdonval

Ah I wonder if this coming from inside gnulib

DaanDeMeyer avatar May 27 '25 12:05 DaanDeMeyer

Doesn't seem to coming from there either. I'm a bit stuck until I know where this error message is coming from inside coreutils. I need the source code to figure out what triggers this and how we could potentially skip it. If they tie file attributes copying to xattr copying that'd be a rather questionable decision if you ask me.

DaanDeMeyer avatar May 27 '25 12:05 DaanDeMeyer

If they tie file attributes copying to xattr copying that'd be a rather questionable decision if you ask me

It looks like that's exactly what happens.

workspace="/tmp/cp_attr"

mkdir -p ${workspace}/{xfs,btrfs}
truncate -s 512M ${workspace}/xfs.img
truncate -s 512M ${workspace}/btrfs.img

mkfs.xfs ${workspace}/xfs.img
mkfs.btrfs ${workspace}/btrfs.img

run0 -D ${workspace} sh -c "
    mount -o loop xfs.img xfs
    mount -o loop btrfs.img btrfs
    mkdir btrfs/foo
    touch btrfs/foo/{bar,baz}
    chattr +c btrfs/foo/baz

    lsattr btrfs/foo/
    
    echo 'copying btrfs/foo/bar -> xfs/bar'
    cp --preserve=xattr btrfs/foo/bar xfs/
    echo 'copying btrfs/foo/baz -> xfs/baz'
    cp --preserve=xattr btrfs/foo/baz xfs/ || echo 'Failed'

    umount btrfs
    umount xfs
"

rm ${workspace}/{xfs,btrfs}.img
rmdir ${workspace}/{xfs,btrfs}

Output:

---------------------- btrfs/foo/bar
--------c------------- btrfs/foo/baz
copying btrfs/foo/bar -> xfs/bar
copying btrfs/foo/baz -> xfs/baz
cp: setting attributes for 'xfs/baz': Operation not supported
Failed

Naturally, if I remove c on btrfs/foo/baz, the copy goes through so indeed cp seems to conflate those two kinds of xattrs.

gdonval avatar May 27 '25 13:05 gdonval

If it is from coreutils, can I get the exact coreutils version?

It is: cp (GNU coreutils) 9.7, from Arch's core/coreutils 9.7-1.

I tried to cut down my laptop's image to a minimal reproducer, but can't get this same error to occur in QEMU.

labichn avatar May 27 '25 13:05 labichn

but it's not a terminating error and mkosi exits successfully:

Yeah that's fine. Rename is tried and if it fails it falls back to copy.

I haven't observed any errors with any other filesystems other than cross-device btrfs<->btrfs

Does it actually stop mkosi or does it only display that btrfs error?

That's where it happens: https://github.com/systemd/mkosi/blob/5e739ef1ed02a4f3b6ae64e50a8ee186cbcb21c2/mkosi/tree.py#L228-L239

So indeed it's not supposed to stop mkosi and should have no negative consequence whatsoever. It's telling you that it cannot do something efficient (rename) because source and destination are on different partitions. It's just that the corresponding btrfs CLI error seeps through and shows up in the terminal.

This is also unrelated to xattr.

gdonval avatar May 27 '25 13:05 gdonval

but it's not a terminating error and mkosi exits successfully:

Yeah that's fine. Rename is tried and if it fails it falls back to copy.

Oh sure, not the first line, I meant the ERROR: Could not create subvolume: Invalid cross-device link is non-terminating, only displayed.

Glad to clip out the failure to reproduce to a gist, it's more noise than useful.

labichn avatar May 27 '25 14:05 labichn

I really can't reproduce your specific error though. I modified my script to make 2 btrfs partitions and used nocow (chattr +C) instead of forcing compression to best reproduce your usecase:

workspace="/tmp/cp_attr"

mkdir -p ${workspace}/{btrfs_dest,btrfs_src}
truncate -s 512M ${workspace}/btrfs_dest.img
truncate -s 512M ${workspace}/btrfs_src.img

mkfs.btrfs ${workspace}/btrfs_dest.img
mkfs.btrfs ${workspace}/btrfs_src.img

run0 -D ${workspace} sh -c "
    mount -o loop btrfs_dest.img btrfs_dest
    mount -o loop btrfs_src.img btrfs_src
    mkdir btrfs_src/foo
    chattr +C btrfs_src/foo
    touch btrfs_src/foo/{bar,baz}

    lsattr btrfs_src/foo/
    
    echo 'copying btrfs_src/foo -> btrfs_dest/foo'
    cp --recursive --no-dereference --preserve=mode,links,timestamps,ownership,xattr --reflink=auto --copy-contents btrfs_src/foo/ btrfs_dest/ --no-target-directory

    umount btrfs_src
    umount btrfs_dest
"

rm ${workspace}/{btrfs_dest,btrfs_src}.img
rmdir ${workspace}/{btrfs_dest,btrfs_src}

and all goes well:

---------------C------ btrfs_src/foo/bar
---------------C------ btrfs_src/foo/baz
copying btrfs_src/foo -> btrfs_dest/foo

gdonval avatar May 27 '25 14:05 gdonval

Same here, I did the same edits to your script and don't have an issue. But a basic container still can't build due to cp: setting attributes for '/work/home/nick/tmp/test/var/log/journal': Operation not supported, which does seem to be the same xattrs issue with cross-device btrfs/btrfs.

labichn avatar May 27 '25 14:05 labichn

Use mkosi --debug-shell and use the shell there to make sure the source and destination are what you think they are (mount from within). I can't reproduce, even by forcing the use of a different btrfs partition.

gdonval avatar May 27 '25 15:05 gdonval

> sudo mkosi -f --debug-shell
...
‣ Could not rename /var/tmp/mkosi-workspace-uo5m73q5/staging/test to /home/nick/tmp/test as they are located on different devices, falling back to copying
ERROR: Could not create subvolume: Invalid cross-device link
cp: setting attributes for '/work/home/nick/tmp/test/var/log/journal': Operation not supported
‣ "cp --recursive --no-dereference --preserve=mode,links,timestamps,ownership,xattr --reflink=auto --copy-contents /work/var/tmp/mkosi-workspace-uo5m73q5/staging/test /work/home/nick/tmp/test --no-target-directory" returned non-zero exit code 1.
bash-5.2# mount
...
/dev/mapper/home-nick on /work/home/nick/tmp type btrfs (rw,nosuid,nodev,noatime,idmapped,compress=zstd:3,noacl,space_cache=v2,user_subvol_rm_allowed,subvolid=3350,subvol=/[email protected]/tmp)
/dev/mapper/root on /work/var/tmp/mkosi-workspace-uo5m73q5/staging/test type btrfs (ro,nodev,noatime,ssd,space_cache=v2,subvolid=691,subvol=/live/var/tmp/mkosi-workspace-uo5m73q5/staging/test)

Last two mounts seem to be the relevant ones, glad to add any other output you need.

labichn avatar May 27 '25 15:05 labichn

I really can't reproduce it (tried to reproduce all your mounting options except idmapping), be it with my script or just reusing your mkosi config: it just works on my machine.

I'm on mkosi-git though and I definitely see something specific and weird that could trip up an old version:

Image

I have ACLs enabled on both partitions so I'm not entirely sure what is going on there.


My mounts look nothing like yours either: workspace matters are handled by overlayfs.


Lastly, by default mkosi.conf and the output directory are on part1 (btrfs) and my workspace directory is on part2 (btrfs too).

I tried moving my workspace from part2 to part1 and mkosi.conf from part1 to part2, it all succeeds.


All this made me wonder: are you on main?


PS: try to clean up your build directory: there should be no need for you to use sudo for this so this whole problem could be caused by stalled files.

gdonval avatar May 27 '25 17:05 gdonval

I appreciate the effort!

All this made me wonder: are you on main?

Yep, latest commit is 39ed5ef6c2f3783c94fa83f06b58a145e6e865b5.

PS: try to clean up your build directory: there should be no need for you to use sudo for this so this whole problem could be caused by stalled files.

The build directory is empty. Agreed there's no need for root with this configuration, but using BaseTrees still requires root, which is used in the configuration where I first encountered this error. (The error only occurs while running as root.)

If you can't reproduce this problem is probably specific to my system. I'll track it down and file a separate issue/PR. Sorry to derail this issue.

labichn avatar May 27 '25 17:05 labichn

Idmapping is complex enough to trigger something weird like converting a warning to an error.

When you are on the debug shell, have you tried to copy-paste the cp ... command and remove ,xattr? Just to double check this is really the trigger here too?

gdonval avatar May 28 '25 09:05 gdonval

@DaanDeMeyer any reason, besides relative slowness, not to use shutil.copytree with shutil.copy2 as copy_function (default)?

I had a quick look and they seem to only copy FS-independent file xattr, not FS-dependent file attr.

gdonval avatar May 28 '25 10:05 gdonval

@gdonval reflink

DaanDeMeyer avatar May 28 '25 10:05 DaanDeMeyer

That's also the reason we don't use rsync. Even if I implemented a copy function that did reflink, the slowness is very noticeable. We want the copying to be a C program, not in python.

DaanDeMeyer avatar May 28 '25 10:05 DaanDeMeyer

That makes sense...

This brings me to... do we actually need to move files around? I'm starting to wonder if instances like this could just be removed and if other instances like what is find there could be replaced with overlays.

gdonval avatar May 28 '25 10:05 gdonval

Yes we do need to move files around, I don't want to add even more overlays to the build process unless absolutely required, and avoiding moving files around is not a good reason.

Honestly, this is a bug in cp, it should allow copying xattrs independently of file attributes and it should allow copying file attributes gracefully. Please file a bug report with coreutils to get this fixed there. Alternatively, come up with another fast copying tool that does everything cp does for us today without these issues and we can consider switching to that. All the required code to write a decent copy tool already lives in systemd, we just need to expose it somewhere.

DaanDeMeyer avatar May 28 '25 11:05 DaanDeMeyer

avoiding moving files around is not a good reason.

FS boundaries are bound to create friction. Today it's a bug in cp, tomorrow it might be that a distro starts actively using chattr for security or for things to work at all and crossing boundaries will lose that part of the information. I'm not happy with this but I can't come up with alternative great design either. All I can think of is "clever" (i.e. complicated) and involves being root.

Please file a bug report with coreutils

Will do.

come up with another fast copying tool

Just tried uutils just in case... It's not ready and even circumventing that, it errors out in the same way because it tries to copy attr as well as xattr.

we just need to expose it somewhere.

I can hear the internet rumbling already... ;)

gdonval avatar May 28 '25 14:05 gdonval

@DaanDeMeyer BTRFS seems to implement normal file attributes through extended attributes. Bug with coreutils is 78623. Immediate mitigation is to filter out BTRFS-specific xattr through /etc/xattr.conf:

# /etc/xattr.conf
...
btrfs.*                       skip           # BTRFS metadata and ACLs

I've asked for a CLI flag to add filters at CLI level. What would be awesome, now that I think about it, would be a way to whitelist security, system, trusted, and user only.

gdonval avatar May 30 '25 09:05 gdonval