open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

Failed to umount oldroot due to nvidia kernel modules

Open ghtesting2020 opened this issue 2 years ago • 5 comments

NVIDIA Open GPU Kernel Modules Version

515.48.07

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Arch linux

Kernel Release

5.18.1

Hardware: GPU

1650, 2060, 3080, etc

Describe the bug

During every shutdown, due the way Nvidia kernel modules are loaded there is always an error printed about "failing to unmount oldroot"

Starting version 251.2-1-arch
/dev/nvme0n1p2: clean, ....files, ....blocks

Broadcast message from yuanhao@yhArch (Sum 2022-06-05 00:20:58 BST):

The system is going for poweroff NOW!

[ 281.772822] sd-umoun[3293]: Failed to unmount /oldroot: Device or resource busy
[ 281.774041] sd-umoun[3294]: Failed to unmount /oldroot/sys: Device or resource busy
[ 281.775539] shutdown[1]: Failed to finalize file systems, ignoreing.
[ 282.513007] reboot: Power down

This appears to be an issue with the way Nvidia loads the modules and not systemd so I am creating a ticket here.

reddit u/KeepsFindingWitches "Are you using an Nvidia GPU? I've seen this happen before, it has something to do with how the Nvidia drivers mount a psuedo-filesystem under a directory in /sys and don't unload before this happens. It's not actually harming anything.

Basically, during shutdown, systemd switches over to a ramfs image, and then attempts to unmount your actual root partition, now at /oldroot. Because there's still another filesystem mounted under it, the system can't unmount it -- but it doesn't matter because it shuts down a second later anyway."

ArchLinux forums user loqs "The nvidia module has an open file under /sys possibly /sys/bus/pci/devices/*/config. If the module was loaded before the root fs. Then provided another instance of /sys was opened on top of the root fs you that might avoid the issue.

Cleaner solution would be for nvidia to provide something that runs at shutdown either removing the modules or notifying the module to close the open file. As an alternative systemd could modify its shutdown logic to move all pseudo filesystems from on top of real filesystems to on top of the pivoted ram disk. So oldroot is not blocked from being unmounted."

medhefgo "In my case the binary blob nvidia driver keeps /oldroot/sys alive for some reason (I found this out by unloading the modules and then trying to unmount again). The second commit tries to work around such cases by utilizing lazy unmounting. An alternative would be to try unloading kernel modules, but that sounds too intrusive to me."

Other references: https://old.reddit.com/r/archlinux/comments/v5c3yw/failed_to_umount_oldroot/ https://bugs.archlinux.org/task/63697#comment203882 https://github.com/systemd/systemd/pull/23348

To Reproduce

  1. Install nvidia-dkms on arch linux.
  2. I have multiple NVMe drives in my computer. not sure if this is required
  3. shutdown

Bug Incidence

Always

nvidia-bug-report.log.gz

n/a

More Info

If this isn't the right place to create a ticket for this then please create one internally or elsewhere. This message is my giving my permission.

ghtesting2020 avatar Jun 06 '22 12:06 ghtesting2020