arch-install-scripts
arch-install-scripts copied to clipboard
Allow lazy umount
Hi there!
I am a big fan of arch-chroot, and often use it to prepare VM disk images. Through absolutely no fault of arch-chroot, I often run into this problem:
conrad@tryptophan ~/hack/makeimg $ sudo arch-chroot test ls
[sudo] password for conrad:
bin boot dev etc home lib lib64 lost+found mnt opt proc root run sbin srv sys tmp usr var
umount: /home/conrad/hack/makeimg/test/proc: target is busy.
The command was run, but I am left with the proc bind-mount still in place. After a long and painful investigation, I am 99% sure that some component of gvfsd or its ecosystem are to blame (I mostly work in GNOME). I haven't managed to pin-point the exact component, but the error reliably never occurs when I shut down GNOME and work in a plain console session (whereas it occurs >95% of the time when working in GNOME).
I haven't found any reference of people experiencing this exact same issue, but e.g. one person seems to have a very similar problem caused by ksysguard: https://superuser.com/a/925862 - as such, I am assuming it is not just the absolutely crazy edge case.
One option to "handle" this is lazy unmounting (umount -l). It reliably works fine for me, and I don't see any super-obvious downsides, but you might know more than I do about that.
So, long story short, would you be willing to accept a patch that allows an option to use lazy unmounting, or even make it the default? If not, I totally understand, it is a bit of an externality, but I would of course be very happy if I could simply use arch-chroot even when working in GNOME :)
Thanks for all the fish!
@bitfehler Thanks for the detailed report.
It seems this is similar to or the same issue we have been experiencing with archiso: https://gitlab.archlinux.org/archlinux/archiso/-/issues/31
So what seems to be happening here is that at least for @bitfehler's case, gvfsd is leaving behind forked and daemonized background processes because gnome will be gnome. A similar mess, with less problematic outcomes, can be seen whenever you quit gnome shell on a boring old host system and return to the tty. They aren't cleaned up, so they hang around in the process table, and use up a small amount of resources as well... And use of mountpoints too, yay.
In principle, this should be solvable by reaping the gvfsd processes before unmounting.
The interesting thing is, arch-chroot has natively had that exact problem before, because a common use of arch-chroot is to run, well, pacman. And pacman will run gpg via the archlinux-keyring install script and /usr/bin/pacman-key. GnuPG does the same daemonizing trick.
This got solved years ago via 2be79c6259cfbf9ebcd258a68fea3ec79f532e32 which works quite well for gnupg at least. I am not sure why it wouldn't work for gvfsd too...
From the linked archiso bug:
It seems this is triggered by the use of util-linux'
unsharewhen runningpacmaninpacstrap. Eitherunshareorpacmandoes not wait for its child processes, which makes thechroot_teardown()function inpacstrapfail to unmount, as mountpoints are still busy.
This statement really confused me, because pacman does not wait for its child processes (why should it? Its operation is unaffected by gnupg's decision to use long-running daemons) and that is exactly why we added unshare, since unshare is "literally a program to wait for its child process and reap any others that didn't exit on their own".
So I would be very interested to know which process is still hanging around. I specifically copied arch-chroot's use of unshare over to pacstrap, because while working on fixes for the keyring I had the issue of pacstrap also running the archlinux-keyring install script, thereby leading to GnuPG.
From the umount manpage description of lazy unmounting:
The recommended use-case for umount -l is to prevent hangs on shutdown due to an unreachable network share where a normal umount will hang due to a downed server or a network partition. Remounts of the share will not be possible.
From memory, I am 95% sure I tried lazy unmounting in my tests for pacstrap before discovering that Dave used unshare in arch-chroot. Lazy unmounting miserably failed, as in it led to mounts like $root/dev never being unmounted since it was technically still busy on the host, which as pointed out by the archiso bug cannot be recovered except by rebooting the system.
So I think lazy unmounting is exactly the last thing we want.
I'd rather try to find out why unshare isn't performing its sole job. It previously failed to be our savior here, in https://bugs.archlinux.org/task/67157 but that turned out to be an upstream util-linux bug which is now fixed.
Hey again. Thanks for picking this up, I wasn't sure if Github is the right place for this.
Thanks for pointing out the use of unshare, I had not really given much thought about that. Looking at that now, I was wondering: shouldn't a new mount namespace also be created (it currently isn't) to avoid such issues, so that gvfsd does not pick up the new filesystems mounted into the chroot?
So much for the theory, in practice I tried adding --mount to the invocation of unshare in arch-chroot. That alone did not help, but out of a hunch I also tried --mount-proc (which implies --mount) - and that actually seems to work!
I am guessing it must be something about the proc filesystem being explicitly mounted as private in the new mount namespace, but I cannot fully explain it. During the original investigation I had tried stuff like wait and retry on failed umounts, and I am pretty sure at that time I had occasionally mountpoints other than /proc fail to umount, but currently I cannot reproduce this and using --mount-proc makes it work 100% of the time.
As you can probably tell from this I am partially just poking at things here, but maybe you find it useful...
because gnome will be gnome.
And, btw, i whole-heartedly agree with that statement, so please don't hesitate to close this as "works for me" should you feel that urge, I would certainly understand.
I'm also running into this problem, and my solution has been to just loop on a grep $DIR /etc/mtab and run umount -qlR $DIR + sleep until it works. I tried running lsof and fuser on the mount points but couldn't actually get a process that was using them.