rpm-ostree icon indicating copy to clipboard operation
rpm-ostree copied to clipboard

Implement compatibility with DKMS (Nvidia, etc.)

Open dustymabe opened this issue 6 years ago • 39 comments

rpm-ostree version info:

  centos-atomic-host:centos-atomic-host/7/x86_64/standard
                Version: 7.1708 (2017-09-15 15:32:30)
                 Commit: 33b4f0442242a06096ffeffadcd9655905a41fbd11f36cd6f33ee0d974fdb2a8
           GPGSignature: 1 signature
                         Signature made Fri 15 Sep 2017 05:17:39 PM UTC using RSA key ID F17E745691BA8335
                         Good signature from "CentOS Atomic SIG <[email protected]>"

When installing nvidia kmod it fails:

# rpm-ostree install epel-release && reboot
#
# cat <<EOF > /etc/yum.repos.d/nvidia.repo
[nvidia]
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/
enabled=1
gpgcheck=0
EOF
#
# rpm-ostree install nvidia-kmod
....
....
Resolving dependencies... done
Overlaying... done

Creating symlink /var/lib/dkms/nvidia/384.81/source ->
                 /usr/src/nvidia-384.81

DKMS: add completed.

Creating symlink /var/lib/dkms/nvidia/384.81/source ->
                 /usr/src/nvidia-384.81

DKMS: add completed.

Creating symlink /var/lib/dkms/nvidia/384.81/source ->
                 /usr/src/nvidia-384.81

DKMS: add completed.
error: Running %post for nvidia-kmod: Executing bwrap(/usr/nvidia-kmod.post): Child process exited with code 8

From the journal:

Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: /usr/share/info/dir: Read-only file system
Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: /usr/share/info/dir: Read-only file system
Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: mkdir: cannot create directory ‘/var/lib/dkms’: Read-only file system
Nov 07 22:04:30 vanilla-c7atomic rpm-ostree[13001]: ln: failed to create symbolic link ‘/var/lib/dkms/nvidia/384.81/source’: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: mkdir: cannot create directory ‘/var/lib/dkms’: Read-only file system
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ln: failed to create symbolic link ‘/var/lib/dkms/nvidia/384.81/source’: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ls: cannot access /var/lib/dkms/nvidia/384.81/source: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: Error! The directory /var/lib/dkms/nvidia/384.81/source/
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: does not appear to have module source located within it.  Build halted.
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: mkdir: cannot create directory ‘/var/lib/dkms’: Read-only file system
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ln: failed to create symbolic link ‘/var/lib/dkms/nvidia/384.81/source’: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: ls: cannot access /var/lib/dkms/nvidia/384.81/source: No such file or directory
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: Error! The directory /var/lib/dkms/nvidia/384.81/source/
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: does not appear to have module source located within it.  Build halted.
Nov 07 22:04:31 vanilla-c7atomic rpm-ostree[13001]: Txn /org/projectatomic/rpmostree1/centos_atomic_host failed: Running %post for nvidia-kmod: Executing bwrap(/usr/nvidia-kmod.post): Child process exited with code 8

The scriptlets are:

-bash-4.2# rpm -qp nvidia-kmod-384.81-2.el7.x86_64.rpm --scripts
warning: nvidia-kmod-384.81-2.el7.x86_64.rpm: Header V3 RSA/SHA512 Signature, key ID 7fa2af80: NOKEY
postinstall scriptlet (using /bin/sh):
dkms add --rpm_safe_upgrade -m nvidia -v 384.81
dkms build -m nvidia -v 384.81
dkms install --force -m nvidia -v 384.81
preuninstall scriptlet (using /bin/sh):
dkms remove --rpm_safe_upgrade -m nvidia -v 384.81 --all || :
postuninstall scriptlet (using /bin/sh):
if [ "$1" -eq "0" ] ; then
    dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
fi

dustymabe avatar Nov 07 '17 22:11 dustymabe

Yeah, there's a huge world of stuff here. Supporting dkms is going to require a lot of work.

cgwalters avatar Nov 07 '17 22:11 cgwalters

@cgwalters Would this also cover supported akmods as a mechanism too? Or would that be easier than DKMS?

Conan-Kudo avatar Dec 04 '17 20:12 Conan-Kudo

Would this also cover supported akmods as a mechanism too? Or would that be easier than DKMS?

I have no idea honestly without diving in a lot. I suspect they're going to be mostly equivalent but it's really just a wild guess.

cgwalters avatar Dec 04 '17 21:12 cgwalters

@cgwalters The Akmods mechanism makes kmod RPMs for the kernel packages installed and installs them, rather than building kmods for the running kernel and just slotting them in.

This was recently integrated into Fedora proper.

Conan-Kudo avatar Dec 29 '17 02:12 Conan-Kudo

https://lists.projectatomic.io/projectatomic-archives/atomic-devel/2017-November/msg00009.html

cgwalters avatar Feb 08 '18 02:02 cgwalters

https://pagure.io/atomic-wg/issue/493

cgwalters avatar May 16 '18 17:05 cgwalters

http://www.projectatomic.io/blog/2018/06/building-kernel-modules-with-podman/

jlebon avatar Jun 06 '18 13:06 jlebon

I said this elsewhere but to repeat here; I think we could pretty easily implement a generic hook in rpm-ostree upgrade that calls out to a user-specified external process, which would accept as input the new ostree, and could perform arbitrary modification (overlayfs), which would then be included in the final commit.

Where things bifurcate a lot here is - do you install the equivalent of dnf builddep kernel on the host? Let's call this option #1. If you do...that's a whole lot of layered stuff and is (IMO) against the larger goals. Or, option #2 - does the hook do what the atomic-wireguard stuff does and basically install the kernel+builddeps in a container? There's some nontrivial infrastructure to hook up rpm-ostree + build container in such a way - do we ship it as a package/container?

Option #3 is to have rpm-ostree itself create a container, reusing the same packages. This would be totally possible, it's a system-level variant of the rpm-ostree ex container support that we have today. But it'd be some nontrivial work in rpm-ostree, and increase our overlap with general-purpose container tools.

What would mix both #2 and #3 is a podman/containers-storage "backend" that knows how to call out to rpm-ostree to do the heavy lifting for the filesystem builds.

cgwalters avatar Aug 24 '18 15:08 cgwalters

Perhaps it's not quite the right issue to discuss this in, but I'd just like to raise the concern that ideally whatever solution for is proposed for this issue should not make it too difficult for an end-user to sign the resulting akmods-built modules for Secure Boot using their own keypair.

As this issue identifies, losing easy access to ZFS, VirtualBox and the NVidia Drivers is a major concern for new users to Atomic/Silverblue, but I think it's also a concern if a user is only able to access the above at the cost of disabling Secure Boot.

alexhaydock avatar Aug 24 '18 22:08 alexhaydock

@alexhaydock Not just new users! I've been a Linux workstation / laptop user since Red Hat 6.2 and if a distro won't support my AMD GCN 1.1 card or WiFi hotspot or HP Omen laptop with an NVidia 1050Ti, I'll run a different distro. Secure Boot is over-rated; if I have to disable it to use my machine, that's exactly what I'll do.

znmeb avatar Sep 07 '18 23:09 znmeb

So, I took a short initial look at this from the perspective of supporting nvidia in silverblue. There are two major suppliers of the nvidia driver in rpm form, rpmfusion and negativo17. rpmfusion seem to only support akmod, whereas negativo17 does both akmod and dkms.

I didn't know any details about dkms or akmod other than the fact that they auto-built drivers before this, so i took a quick look at them:

akmod

You install akmod-nvidia, it depends on akmods (in fedora) and contains just:

/usr/src/akmods/nvidia-kmod-396.54-1.fc28.src.rpm
/usr/src/akmods/nvidia-kmod.latest

Then akmods itself has a few hooks (boot service, rpm transaction hook, optionally shutdown service) that gets called such that we can rebuild the srpm whenever there is a new kernel, generating a kernel-specific version of the bundled src.rpm. For example, the above srpm + kernel 4.18.10-200 generates the built rpm kmod-nvidia-4.18.10-200.fc28.x86_64-396.54-1.fc28.x86_64. This is cached in /var/cache/akmods/nvidia/ and installed. That rpm then contains the driver module:

/usr/lib/modules/4.18.10-200.fc28.x86_64/extra/nvidia/nvidia.ko.xz

This seems very nice, simple and rpm-focused, and the akmods program is a 500 line shellscript.

dkms

dkms is a more generic framework and works on multiple distros. As such it has its own database of stuff in /var/lib/dkms, matched with sources in /usr/src which is updated with the "dkms" helper. The dkms-nvidia package contains the sources for the module extracted in /usr/src/nvidia-396.54, as well as a dkms.conf file telling dkms how to build the sources.

The %post of the dkms-nvidia rpm then does:

dkms add -m %{dkms_name} -v %{version} -q || :
dkms build -m %{dkms_name} -v %{version} -q || :
dkms install -m %{dkms_name} -v %{version} -q --force || :

The first one sets up a symlink from /var/lib/dkms/nvidia/396.54/source to /usr/src/nvidia-396.54, the second builds the module for the current kernel and puts it in /var/lib/dkms/nvidia/396.54/4.18.10-200.fc28.x86_64 and the last one then copies the result from that into /lib/modules/4.18.10-200.fc28.x86_64, where it sits unowned in the rpm db.

Additionally, dkms has a hooks similar to akmods (boot service, rpm transaction hook) that runs the build and install parts for the new kernel.

what works for rpm-ostree

dkms is not really a great fit for rpm-ostree with its reliance of stuff in /var, and non-rpm-tracked module files. akmods seems like a pretty clean to me, and fits the overall rpm based scheme of rpm-ostree, but building on the live system or in the rpm transaction hook clearly doesn't work.

However, they way akmods work is that you create a kmod-nvidia srpm with full sources, but when built normally just generates a akmod-nvidia rpm (containing a copy of the srpm, which is later rebuild targeting a specific kernel). This means that the yum repo for the akmod has a .src.rpm for the driver which is easy for rpm-ostree to get at via dnf.

So, the way I propose this would work is that you can layer srpms as well as rpms:

rpm-ostree install-from-source kmod-nvidia

This would mean the same as install kmod-nvidia, except it would dnf download --source the srpm, build that in a container, and then layer the resultant rpm as it would usually layer it.

There are some special things we need to to when building the srpm. For instance we need to set the kernels rpm macro to the kernel version in the ostree image to make the akmod srpm build a targeted kernel, and we need to ensure that the kernel and kernel-devel in the build container matches the kernel in the ostree image. Still, this strikes me as pretty simple stuff.

alexlarsson avatar Nov 13 '18 09:11 alexlarsson

I guess the question is, do we take a dependency on podman & co for the build container, or do we use rpm-ostree itself to construct the image for building the srpm, deploy it to a termporary location and spawn it via bwrap? @cgwalters ?

alexlarsson avatar Nov 13 '18 09:11 alexlarsson

So, the way I propose this would work is that you can layer srpms as well as rpms:

Thanks for the analysis; you went a bit deeper into the details of both akmod/dkms than I had before. But some of this was already noted in https://github.com/projectatomic/rpm-ostree/issues/1091#issuecomment-415792755 right?

I like the idea, a whole lot of implications. Actually in general...I would really love to also support a more "build from source" model on top of rpm-ostree (e.g. "apply this patch to systemd", or "build systemd from this git commit"). There's a lot of prior art here; obviously libostree was split out of gnome-continuous which has such a model. Such a system could be built on top of something that built srpms, although I lean a bit towards skipping the srpm path and orienting more towards at least dist-git, as well as direct from upstream git.

But even this though opens up a question as whether we would really want the build tools on the host or in a container.

or do we use rpm-ostree itself to construct the image for building the srpm,

This would probably block on https://github.com/projectatomic/rpm-ostree/issues/1180

The "build using container" was already prototyped out here https://github.com/projectatomic/rpm-ostree/issues/1091#issuecomment-395068927

Big picture...I lean a bit towards the container path. But I am not likely to hack on this myself in the near future (even though my laptop has an nvidia card, I don't play games and nouveau is OK for me).

cgwalters avatar Nov 13 '18 14:11 cgwalters

So, the way I propose this would work is that you can layer srpms as well as rpms:

Thanks for the analysis; you went a bit deeper into the details of both akmod/dkms than I had before. But some of this was already noted in #1091 (comment) right?

Yeah, I just have a primary interest in the specific nvidia case, so i wanted to dump my research here.

I like the idea, a whole lot of implications. Actually in general...I would really love to also support a more "build from source" model on top of rpm-ostree (e.g. "apply this patch to systemd", or "build systemd from this git commit"). There's a lot of prior art here; obviously libostree was split out of gnome-continuous which has such a model. Such a system could be built on top of something that built srpms, although I lean a bit towards skipping the srpm path and orienting more towards at least dist-git, as well as direct from upstream git.

I agree that this would be nice. However, there are two complications to this.

First of all rpm-ostree needs to have a way to specify how to build the modifications and store these in the ostree metadata next to where the package layer is stored. Punting this to srpm means all we need is to store the srpm name. Of course, one could punt specifiying this to some other highlevel method, like "run the container image named foo", then all you need to do is store the image name in the metadata.

Secondly, there needs to be a way to extract the modifications of the new build into the final ostree image. With rpm the build and the install are automatically separated, whereas in a container situation they might not be. For example, you will be building in a container that has a /usr with compilers, etc, but then you want to install into a different /usr.

I can imagine solving this. For example, you could have the newly composed ostree image checked out somewhere, and then you can use rofiles-fuse to get a safe version of that, which you mount as a volume in the build container, and then when you build the app you set DESTDIR to the volume during install. Should work, but it is a bunch of extra work you get for free from rpmbuild.

But even this though opens up a question as whether we would really want the build tools on the host or in a container.

or do we use rpm-ostree itself to construct the image for building the srpm,

This would probably block on #1180

The "build using container" was already prototyped out here #1091 (comment)

Big picture...I lean a bit towards the container path. But I am not likely to hack on this myself in the near future (even though my laptop has an nvidia card, I don't play games and nouveau is OK for me).

One complexity of using the container path here is that you have to somehow ensure that the build container matches the ABI of the final ostree image. For example, if we're building kernel modules we need to have the right kernel-devel header. However, if you're building arbitrary userspace code you need to match the full userspace ABI. I.e. if you build against a library it needs to be the same version of the library and built in the same way, you need same c++ compiler ABIs, etc. If we automatically compose the build environment from the same packages image as the ostree image this is a lot easier to guarantee.

alexlarsson avatar Nov 13 '18 15:11 alexlarsson

I lean a bit towards skipping the srpm path and orienting more towards at least dist-git, as well as direct from upstream git.

This is problematic for things like akmods, which rely on being able to build from a source package. In addition, you can't guarantee git. You can, however, guarantee a srpm.

One complexity of using the container path here is that you have to somehow ensure that the build container matches the ABI of the final ostree image. For example, if we're building kernel modules we need to have the right kernel-devel header. However, if you're building arbitrary userspace code you need to match the full userspace ABI. I.e. if you build against a library it needs to be the same version of the library and built in the same way, you need same c++ compiler ABIs, etc. If we automatically compose the build environment from the same packages image as the ostree image this is a lot easier to guarantee.

Could we do something similar to the btrfs seed+sprout thing to support a transparent layer that is invoked as a container to do these things? The other, more practical issue is that we can't guarantee that the matching kernel packages are going to be present in the repo at the time this happens. So what do we do then?

Conan-Kudo avatar Nov 14 '18 03:11 Conan-Kudo

@cgwalters "Big picture...I lean a bit towards the container path. But I am not likely to hack on this myself in the near future (even though my laptop has an nvidia card, I don't play games and nouveau is OK for me)."

I have an HP Omen with Intel graphics and an NVidia 1050Ti. nouveau black-screens on every current Linux distro I've tried; when I do an install I have to blacklist nouveau and bring the machine up with just the Intel graphics. Then I add the NVidia drivers after the install is finished.

How the drivers get built doesn't matter to me - if it takes a container that only occupies resources during an install and has to do a moderate-sized compile, that's no big deal. Not having the NVidia drivers is a show-stopper. So I like the source RPM install idea a lot. ;-)

znmeb avatar Nov 14 '18 03:11 znmeb

An alternative solution would be to provide kmod packages. It doesn't solve the main issue by switching the responsibility to package maintainers but it is the only thing that doesn't involve some drastic changes/heavy development and basically just works™. I have switched form DKMS to kmod packages for ZFS on Linux.

mskarbek avatar Nov 14 '18 15:11 mskarbek

@mskarbek kmod packages are not special in any way and probably works already. However, the problem with them is that they need to be updated in lock-step with new kernel updates, and they stop working the second you run a non-standard kernel. In practice this means that people need something like dkms to be guaranteed to have an up-to-date nvidia driver.

alexlarsson avatar Nov 16 '18 11:11 alexlarsson

This should be working in F30 Silverblue

matthiasclasen avatar Mar 05 '19 19:03 matthiasclasen

@matthiasclasen Great news! How do I get F30 Silverblue to test? I have a pretty short window of availability this coming week but can squeeze this in.

znmeb avatar Mar 05 '19 19:03 znmeb

I’m not sure its working automatically yet, the nvidia-kmod rpm needs to be rebuilt with the new kmodtools rpm. I’ll check it out tomorrow and write a blog post about it.

On Tue, 5 Mar 2019 at 20:14, M. Edward (Ed) Borasky < [email protected]> wrote:

@matthiasclasen https://github.com/matthiasclasen Great news! How do I get F30 Silverblue to test? I have a pretty short window of availability this coming week but can squeeze this in.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/projectatomic/rpm-ostree/issues/1091#issuecomment-469820034, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8eECrwsNIuHNGMPuT9Dsq3NikNURdAks5vTsIGgaJpZM4QVkDc .

alexlarsson avatar Mar 05 '19 19:03 alexlarsson

Would be awesome! This is what forced me back to the traditional RPM setup after trying out Silverblue.

jamescassell avatar Mar 06 '19 01:03 jamescassell

I wrote up how to test this:

https://blogs.gnome.org/alexl/2019/03/06/nvidia-drivers-in-fedora-silverblue/

On Wed, Mar 6, 2019 at 2:35 AM James Cassell [email protected] wrote:

Would be awesome! This is what forced me back to the traditional RPM setup after trying out Silverblue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/projectatomic/rpm-ostree/issues/1091#issuecomment-469930244, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8eEMfB_UoTI_EfG5qnsBhYD2p-EYeWks5vTxtFgaJpZM4QVkDc .

alexlarsson avatar Mar 06 '19 13:03 alexlarsson

This should be working in F30 Silverblue

Hi @matthiasclasen, could you kindly explain why?

whs-dot-hk avatar Mar 29 '19 06:03 whs-dot-hk

Not sure I understand the question. I should be working because the necessary changes were merged

matthiasclasen avatar Mar 29 '19 16:03 matthiasclasen

FWIW, DKMS support is required for VirtualBox and ZFS. In addition, a ZFS root filesystem install requires the ability to run grub2-mkconfig to generate a new grub.conf, and to run dracut to generate a new initramfs. Running grub2-install to install grub is also needed on systems with legacy (non-EFI) BIOS.

rm-happens avatar Apr 01 '19 08:04 rm-happens

This should be working in F30 Silverblue

I'm on F32 Silverblue and I seem to be unable to modprobe a dkms module. By "working" do you mean that in F30 Silverblue there's a kmod-nvidia package? But what about DKMS? Is it "working" too or is there anything else to be done?

I installed it from https://copr.fedorainfracloud.org/coprs/sentry/v4l2loopback/, then following its instructions, I rebooted and executed:

➤ sudo modprobe v4l2loopback
modprobe: FATAL: Module v4l2loopback not found in directory /lib/modules/5.6.16-300.fc32.x86_64

If I'm not wrong, I should be able to modprobe it. :shrug:

yajo avatar Jun 09 '20 18:06 yajo

This should be working in F30 Silverblue

I'm on F32 Silverblue and I seem to be unable to modprobe a dkms module. By "working" do you mean that in F30 Silverblue there's a kmod-nvidia package? But what about DKMS? Is it "working" too or is there anything else to be done?

I installed it from https://copr.fedorainfracloud.org/coprs/sentry/v4l2loopback/, then following its instructions, I rebooted and executed:

➤ sudo modprobe v4l2loopback
modprobe: FATAL: Module v4l2loopback not found in directory /lib/modules/5.6.16-300.fc32.x86_64

If I'm not wrong, I should be able to modprobe it. shrug

Unfortunately you can't do this. rpm-ostree based distros e.g Silverblue do not support dkms kernel modules. Any third-party modules should be compiled as a kmod module against the running kernel and packaged into an rpm package so you can install it with rpm-ostree. rpmfusion.org provides some kmod modules but v4l2loopback is not among them. I'm actually considering making the package and submit it to rpmfusion if I find the time to do so. more info: https://rpmfusion.org/Packaging/KernelModules/Kmods2

In the meantime you can install kernel-devel and and "Development tools" group in a toolbox container and compile the module there. Then you can load the .ko file in the host kernel with insmod. The downside is that you need to do this every time you get a kernel update.

There is also the possibility of v4l2loopback to be included in the mainline kernel in future: https://github.com/umlaeute/v4l2loopback/issues/268

efouladi avatar Aug 30 '20 20:08 efouladi

Thanks for the explanation. If you manage to package that module, it'd be awesome, as in COVID-19 times, being able to use a phone or camera as webcam is more needed.

yajo avatar Aug 31 '20 19:08 yajo

In the meantime you can install kernel-devel and and "Development tools" group in a toolbox container and compile the module there. Then you can load the .ko file in the host kernel with insmod. The downside is that you need to do this every time you get a kernel update.

@fouladi Thanks for sharing, and I'd also love to see v4l2loopback if not in the mainline kernel then on rpmfusion!

However, following your pointers.. I can't seem to load the toolbox compiled .ko file into the host kernel using insmod. I just get the response "insmod: ERROR could not insert module v4l2loopback.ko: Operation not permitted".

Any other pointers?

jadjei avatar Sep 12 '20 23:09 jadjei