bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Unable to use lockdown mode with NVIDIA module on bottlerocket

Open db376 opened this issue 1 year ago • 3 comments

When attempting to load the NVIDIA kernel module on a Bottlerocket AMI using kernel lockdown = integrity, errors like the following are produced:

5.879228] driverdog[2088]: 23:10:41 [ERROR] '/usr/bin/modprobe' failed - stderr: modprobe: ERROR: could not insert 'nvidia': Operation not permitted
[    5.881035] driverdog[2088]: modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
[    5.882187] driverdog[2088]: modprobe: ERROR: could not insert 'nvidia_modeset': Operation not permitted
[FAILED] Failed to start Load additional kernel modules.
See 'systemctl status load-kernel-modules.service' for details.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.

This was tested on the following versions in us-east-1: 1.30 (ami-0c2f741e432159b2c), 1.29 (ami-06033e6f46c64c7db), and 1.20 (ami-046b028e6b00a3938).

Sef-signing also does not work as a workaround - rather, we receive validation rejected.

db376 avatar Sep 26 '24 19:09 db376

There are two factors at work here:

  1. the NVIDIA kmods aren't linked until runtime (for software licensing reasons) and can't be signed by the same ephemeral key used to sign the kernel's own modules (for policy reasons)
  2. for no very good reason, there's no mechanism in the build system today to deal with signing kmods - at all, really, we just rely on the kernel to do its own signing with a throwaway key

I've been considering ways to address (2) recently since we have two newly added external kmods - the Neuron driver and the NVIDIA open source driver - that really ought to be signed and trusted.

bcressey avatar Oct 01 '24 15:10 bcressey

Hey @bcressey is there a solution in sight?

0xMAYANK avatar Sep 08 '25 09:09 0xMAYANK

I still want better support for signing kmods, but as a workaround we can try tightening up the dependencies for the units that call driverdog --link-modules to force them to run before we apply settings, which is when lockdown enforcement begins.

bcressey avatar Sep 09 '25 21:09 bcressey