dkms
dkms copied to clipboard
dkms does not appear to be chroot safe
It appears that dkms
does not work correctly in a chroot environment. You can see an example of this by simply doing:
smw:/image_roots # chroot my_image_root/
smw:/ # dkms status
/usr/sbin/dkms: line 1980: /dev/fd/62: No such file or directory
/usr/sbin/dkms: line 1912: /dev/fd/62: No such file or directory
smw:/ #
The reason this is important is many HPC cluster management systems build images off-line in a chroot prior to deployment. And we have been encountering packages that use dkms in the rpm post install scriptlets and these packages fail to install in these contexts.
A lot of these systems use zypper
under the covers, like:
smw:/ # zypper --root /image_roots/my_image_root install rocm-dkms
It seems the issue is that some of the functions in dkms rely on on the shell mechanism described here:
https://unix.stackexchange.com/questions/156084/why-does-process-substitution-result-in-a-file-called-dev-fd-63-which-is-a-pipe#:~:text=So%20%2Fdev%2Ffd%2F63,output%20of%20your%20ls%20call.
If those functions could be refactored not to use that mechanism, it seems like dkms should work in the zypper --root
context, and if that worked, it should work for this whole class of cluster management systems.
Here is an example diff for a workaround we used:
flubber-smw:/var/opt/cray/imps/image_roots # diff gsetterhol_NVIDIA-699/usr/sbin/dkms gsetterhol_NVIDIA-699/usr/sbin/dkms.bak
23,24d22
< set -x
<
1915,1917c1913
< local mvka m v k a kern status tmp_file
< tmp_file=$(mktemp)
< module_status_weak "$@" > $tmp_file
---
> local mvka m v k a kern status
1921,1922c1917
< done < $tmp_file
< rm $tmp_file
---
> done < <(module_status_weak "$@")
1986,1988c1981
< local status mvka m v k a tmp_file
< tmp_file=$(mktemp)
< module_status "$@" > $tmp_file
---
> local status mvka m v k a
2001,2002c1994
< done < $tmp_file
< rm $tmp_file
---
> done < <(module_status "$@")
flubber-smw:/var/opt/cray/imps/image_roots #
This gets status working in a chroot w/o /dev/, /sys/, and so forth. A better solution might be to use local variables (e.g. no temp file), but hopefully this can help to get started.
@gts-cray can you make that a MR + ideally some tests?
Glancing at the latest code - we have a mktemp_or_die
helper, although more importantly it's being used in a number of places - safe_source()
to read the config file, around rpm_safe_upgrade
, checking status, making a tarball, extracting/importing from tarball, dummy check we can write to /tmp
.
So as-is the diff pasted is very incomplete. I'm more concerned that that w/o any tests, we'll end up breaking it all the time.
It looks like recently in commit 5111f9cfe56cd357d81d7471fd090459bf905f99 additional checks were put in that definitely don't solve this but give a warning instead. @evelikov, the most obvious automated test for this would need to start a container or a chroot environment, requiring some degree of privilege when running the tests (or at least a software stack supporting user namespaces).
An alternative if the bash process substitution ( <( ... )
) is the only problem would just be a test that validates that process substitution is not present anywhere in the codebase.
@dmjacobsen thanks for pointing that commit - I already had a look at it while it was proposed. Our test suite is based on Github Actions, which uses docker + containers under the hood.
Seems like you have some ideas already, so if you can put those in terms of PR that would be great.