fatrace icon indicating copy to clipboard operation
fatrace copied to clipboard

fantoify fsid mark failing for btrfs subvolumes: Invalid cross-device link

Open ohsix opened this issue 3 years ago • 13 comments

probably already know this from the failing test but,

fanotify_test_fid returns -EXDEV on btrfs

https://github.com/torvalds/linux/blob/7d6beb71da3cc033649d641e1e608713b8220290/fs/notify/fanotify/fanotify_user.c#L1075

the older version without FAN_REPORT_FID was working until I upgraded the os

[root@pro ohsix]# fatrace 
fatrace: Failed to add watch for /: Invalid cross-device link
DEBUG: mount / fd 4
DEBUG: ignore: fsname: sysfs dir: /sys type: sysfs
DEBUG: ignore: fsname: proc dir: /proc type: proc
DEBUG: ignore: fsname: devtmpfs dir: /dev type: devtmpfs
DEBUG: ignore: fsname: securityfs dir: /sys/kernel/security type: securityfs
DEBUG: ignore: fsname: tmpfs dir: /dev/shm type: tmpfs
DEBUG: ignore: fsname: devpts dir: /dev/pts type: devpts
DEBUG: ignore: fsname: tmpfs dir: /run type: tmpfs
DEBUG: ignore: fsname: cgroup2 dir: /sys/fs/cgroup type: cgroup2
DEBUG: ignore: fsname: pstore dir: /sys/fs/pstore type: pstore
DEBUG: ignore: fsname: none dir: /sys/fs/bpf type: bpf
DEBUG: ignore: fsname: none dir: /sys/kernel/tracing type: tracefs
DEBUG: ignore: fsname: selinuxfs dir: /sys/fs/selinux type: selinuxfs
DEBUG: ignore: fsname: systemd-1 dir: /proc/sys/fs/binfmt_misc type: autofs
DEBUG: ignore: fsname: debugfs dir: /sys/kernel/debug type: debugfs
DEBUG: ignore: fsname: mqueue dir: /dev/mqueue type: mqueue
DEBUG: ignore: fsname: hugetlbfs dir: /dev/hugepages type: hugetlbfs
DEBUG: ignore: fsname: fusectl dir: /sys/fs/fuse/connections type: fusectl
DEBUG: ignore: fsname: binfmt_misc dir: /proc/sys/fs/binfmt_misc type: binfmt_misc
DEBUG: ignore: fsname: configfs dir: /sys/kernel/config type: configfs
DEBUG: ignore: fsname: tmpfs dir: /tmp type: tmpfs
DEBUG: add watch for btrfs mount /home
fatrace: Failed to add watch for /home: Invalid cross-device link
DEBUG: mount /home fd 6

ohsix avatar Mar 08 '21 02:03 ohsix

probably already know this from the failing test but,

No, which one is that? The CI here runs on GitHub workflows, which is Ubuntu 20.04 on ext4 (... I guess), and it is happy.

So that means that fanotify_init (FAN_CLASS_NOTIF | FAN_REPORT_FID, O_LARGEFILE) succeeds (it should give EINVAL if it's not supported), but fanotify_mark() fails then?

I added a possible fix to https://github.com/martinpitt/fatrace/tree/btrfs , want to try that?

I won't land it yet, I want to try and reproduce this in a test (on a loop device).

martinpitt avatar Mar 12 '21 14:03 martinpitt

I just added a test for btrfs, on a loop device. This works fine, and also that's not too surprising -- after alll, there is nothing cross-device there. I figure you have something slightly more fancy? Subvolumes? /home being a symlink or bind-mount or something?

martinpitt avatar Mar 12 '21 15:03 martinpitt

doh ya.

/ and /home are two subvolumes on the same device /dev/sda2 on / type btrfs (rw,relatime,seclabel,ssd,space_cache,subvolid=100027,subvol=/root) /dev/sda2 on /home type btrfs (rw,relatime,seclabel,ssd,space_cache,subvolid=100026,subvol=/home)

the failing test was CI, one of the containers. called "btrfs tests" now (don't remember if it was at the time)

as for the btrfs branch: it does the same thing fatrace did before the patch, runs but there's no events from / or /home

the thing that made it confusing is that fatrace runs and doesn't exit, it just didn't print any events for the expected filesystems (when run with no options) it sees and sets up monitors on squashfs files snap has mounted, but the events on those filesystems are relatively rare so it wasn't easy to see that was the case

don't know much about fanotify but if there are filesystems or mounting scenarios where a combination of flags won't work it would be useful to have a way for fatrace to say so, like printing a table of all the filesystem watches and the actual flags being used. and maybe a strict/loose option to make fatrace exit when the flags you ask for can't be done on the filesystem(s)

that leaves the door open for running fatrace with no options and showing some events on btrfs and more where it is able (which is 99% of my use-case, just to take a peek at what's going on)

thank you for looking into this

ohsix avatar Mar 13 '21 00:03 ohsix

sorry for the delay,

here's everything it says in debug mode before pausing for lack of filesystem activity:

[ohsix@pro fatrace]$ sudo ./fatrace DEBUG: FAN_MARK_FILESYSTEM not supported; falling back to FAN_MARK_MOUNT fatrace: Failed to add watch for /: Invalid argument DEBUG: mount / fd 4 DEBUG: ignore: fsname: sysfs dir: /sys type: sysfs DEBUG: ignore: fsname: proc dir: /proc type: proc DEBUG: ignore: fsname: devtmpfs dir: /dev type: devtmpfs DEBUG: ignore: fsname: securityfs dir: /sys/kernel/security type: securityfs DEBUG: ignore: fsname: tmpfs dir: /dev/shm type: tmpfs DEBUG: ignore: fsname: devpts dir: /dev/pts type: devpts DEBUG: ignore: fsname: tmpfs dir: /run type: tmpfs DEBUG: ignore: fsname: cgroup2 dir: /sys/fs/cgroup type: cgroup2 DEBUG: ignore: fsname: pstore dir: /sys/fs/pstore type: pstore DEBUG: ignore: fsname: none dir: /sys/fs/bpf type: bpf DEBUG: ignore: fsname: none dir: /sys/kernel/tracing type: tracefs DEBUG: ignore: fsname: selinuxfs dir: /sys/fs/selinux type: selinuxfs DEBUG: ignore: fsname: systemd-1 dir: /proc/sys/fs/binfmt_misc type: autofs DEBUG: ignore: fsname: mqueue dir: /dev/mqueue type: mqueue DEBUG: ignore: fsname: hugetlbfs dir: /dev/hugepages type: hugetlbfs DEBUG: ignore: fsname: debugfs dir: /sys/kernel/debug type: debugfs DEBUG: ignore: fsname: fusectl dir: /sys/fs/fuse/connections type: fusectl DEBUG: ignore: fsname: configfs dir: /sys/kernel/config type: configfs DEBUG: ignore: fsname: binfmt_misc dir: /proc/sys/fs/binfmt_misc type: binfmt_misc DEBUG: ignore: fsname: tmpfs dir: /tmp type: tmpfs DEBUG: add watch for btrfs mount /home fatrace: Failed to add watch for /home: Invalid argument DEBUG: mount /home fd 6 DEBUG: add watch for ext4 mount /boot fatrace: Failed to add watch for /boot: Invalid argument DEBUG: mount /boot fd 7 DEBUG: ignore: fsname: tmpfs dir: /run/user/1000 type: tmpfs DEBUG: ignore: fsname: gvfsd-fuse dir: /run/user/1000/gvfs type: fuse.gvfsd-fuse DEBUG: ignore: fsname: tracefs dir: /sys/kernel/debug/tracing type: tracefs DEBUG: ignore: fsname: portal dir: /run/user/1000/doc type: fuse.portal

ohsix avatar Apr 04 '21 04:04 ohsix

any news on this? if it helps the expected behavior of -c is also busted :)

[ohsix@pro /]⛈️ fatrace -c
fatrace: Failed to add watch for .: Invalid cross-device link

this is a default fedora install with the default volume manager / btrfs options, gnome-boxes can download the iso and install it for you pretty quick. or if there's anything i can collect here please let me know

ohsix avatar Aug 15 '21 12:08 ohsix

Sorry for the delay! I see this as well now on a Fedora 35 cloud image, which uses btrfs with subvolumes.

This is also easy to reproduce with the integration test:

--- a/tests/fatrace-btrfs
+++ b/tests/fatrace-btrfs
@@ -16,7 +16,9 @@ mkdir -p "$MOUNT"
 mount -o loop "$IMAGE" "$MOUNT"
 trap "umount -l '$MOUNT'" EXIT INT QUIT PIPE
 
-cd "$MOUNT"
+btrfs subvolume create "$MOUNT/subv1"
+
+cd "$MOUNT/subv1"
 
 echo "hello" > world.txt

However, I'm afraid I wouldn't know what to do here.. both the new and old fanotify_mark() APIs return EXDEV/EINVAL, so I am at loss how else to watch events on btrfs subvolumes. Looks like the fanotify API just simply does not support this?

Note that the "pausing for lack of fs activity" only happens if you run it without options -- as long as there is at least one watched fs, it will wait for that. With e.g. --current-mount, it will exit immediately with the error.

martinpitt avatar Oct 16 '21 10:10 martinpitt

Ah, it's not actually that bad.. I can runfatrace --current-mount on a mount of the btrfs root, just not in any mounted subvolume. That watch will still get the subvolume events, I added a test in commit 061b41469d2c904997a23e545c50cba19241967d.

Now, I realize that does not help much if you only have subvolumes mounted, and not the root device -- but it seems this is the best that Linux can do.

martinpitt avatar Oct 16 '21 10:10 martinpitt

i think I can adapt to that, thank you for looking into it

i looked at the man pages again and they do remark about EXDEV/NODEV & btrfs, here's where it was added https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man2/fanotify_mark.2?id=0a4db6dc742d9150d73048e889de9e6accc53d46

can see that it might be a semantic change but i don't know any of this well enough to say that it could work like it did before

ohsix avatar Oct 17 '21 05:10 ohsix

It seems to be a popular choice for distros to use subvolumes by default now on btrfs, even Debian's default of putting everything in an @ subvolume triggers this bug

A quick and dirty workaround is to mount the root of the root partition (i.e. mount without subvol= parameter) elsewhere before running fatrace. As confirmed above this correctly picks up writes to subvolumes

lukefor avatar Jan 01 '22 20:01 lukefor