MOK enrollment flow is disrupted on secure boot systems when PXE boot entries are first in boot order
Overview
Relevant context: https://bugs.launchpad.net/maas/+bug/2111335
I have identified a bug due to how a few different parts of some boot processes interact, which could theoretically be addressed in any of a few different places.
This bug report references MAAS, which is Canonical's machine provisioning tool. The main thing you need to know about MAAS is that test devices connected to it are configured with PXE boot as the first EFI boot entry, so the standard boot flow for such devices looks like this:
- Machine boots from PXE device
- MAAS serves an .efi entry, which either contains the ephemeral image data (during initial provisioning), or a 'shim' efi entry that instructs the test device to boot from the persistent disk (if the machine has already been provisioned)
After I deployed a machine via MAAS, enabled secure boot, and installed a DKMS module, I was naturally presented with the usual warnings about the MOK enrollment I'd need to do at the next boot. However, since this machine's first boot entry was the PXE device that it uses for MAAS, which serves a 'shim' entry that isn't accepted by the SB-enabled machine, the only way I can actually get to that post-reboot MOK enrollment screen is when I explicitly go into the UEFI and select the persistent boot drive. Since MOK enrollment isn't completed, any attempts to load the PXE-provided EFI entry first give a vague error saying "something has gone seriously wrong". (this is resolved after you select the correct boot device explicitly on next boot and finish the enrollment though).
Relevance to DKMS
One way I could conceivably see this being handled is if DKMS were to instruct efibootmgr to switch the first boot entry to the persistent drive for the next boot, and then once the boot succeeds (which I believe could only happen after MOK enrollment, once the machine is in that state), there could be some trigger to restore the boot order to its initial state.
Usually I wouldn't be a fan of a solution that requires introducing additional state for something like this, but I would argue that we're already in a very 'stateful' position as soon as we need to boot into the persistent installation first to complete MOK enrollment.
While I have only personally reproduced this while using MAAS, it also seems like an issue that could occur on any PXE boot setup with this flow, so a generic solution that happens at module install time seems preferable.
Steps to reproduce on a MAAS-deployed machine:
- Deploy a DUT with Secure Boot disabled
- After deployment, enable SB
- Reboot and install a DKMS driver, such as nvidia-driver-550-server
- Reboot
Expected behavior (what should have happened?):
Final MOK enrollment step is presented after reboot, and subsequent reboots/deployments work as expected after key enrollment
Actual behavior (what actually happened?):
TFTP boot fails, and I"m presented with:
>>Checking Media Presence......
>>Media Present......
Downloading NBP file...
NBP file downloaded successfully.
Fetching Netboot Image revocations.efi
Unable to fetch TFTP image: TFTP Error
Fetching Netboot Image mmx64.efi
Unable to fetch TFTP image: TFTP Error
Failed to start MokManager: TFTP Error
Something has gone seriously wrong: import_mok_state() failed: TFTP Error
No future deployment attempts work, and the machine remains in a bad state until I explicitly boot into the UEFI entry for the persistent device
Additional context
This might be better addressed somewhere in systemd, if DKMS does not find it appropriate to manage post-reboot state. (I'm not sure if there are any places where dkms does this, or if having dkms do that would be an acceptable solution from the dkms project's perspective). Let me know your thoughts on where this fix should live if not DKMS.
Hello, I think it should not be DKMS to manage efi boot variables and similar stuff, taking also in consideration multiple distributions at the same time. At the moment anything regarding the topic of EFI variables / boot order etc. is completely outside of DKMS scope and I don't think it should be managed here at all.
I manage/managed similar setups with systems always booting from the network, but after a system is successfully installed, the provisioning part would make sure that the target system would not get an executable to boot. Example:
- Prepare everything for the system (unattended information for the installation, boot loader, DHCP configuration, etc.)
- Boot and install the system through PXE
- Drop the DHCP configuration, unattended files, etc. after the successful installation of the system
- System boots from the network and has either a chainload configured pointing to the disk (in case of grub) or no executable gets passed to the system and then it boots from disk.
In particular, where I work now, the default GRUB menu, before the other entries has a default of:
set default=0
set timeout=60
set menu_color_highlight="light-green/black"
menuentry "Boot from disk" {
set root=(hd0)
chainloader +1
boot
}
And this is part of the default grub menu for all network booting machines after they are staged. Once the system is staged, the specific Grub menu for the system is deleted and a default Grub menu that works across all systems is provided by default.
This way, also executables like MokManager in the EFI partition work as expected.
It's also true that when calling fwupd to apply firmware updates (can also be done during an unattended installation) the systems sets the --bootnext entry with efibootmgr and then the systems executes just one boot with the appropriate EFI executable to flash the firmware; so theoretically you could also do the same with DKMS setting MokManager as the next boot item.
If you want to experiment with this, a proof of concept MR is more than welcome. Thanks.
systems sets the --bootnext entry with efibootmgr and then the systems executes just one boot with the appropriate EFI executable to flash the firmware; so theoretically you could also do the same with DKMS setting MokManager as the next boot item. If you want to experiment with this, a proof of concept MR is more than welcome. Thanks
Thanks for the suggestion! This seems like a better approach than introducing more state that would have to be tracked outside of efibootmgr. I'll look into that and will let you know when I have an update.
Upon additional investigation, I am now thinking it would be more appropriate to address this within mokutil since it looks like mokutil --import is the final step that puts impacted machines in the described state, and since a DKMS solution would not cover the full surface of use cases that could be impacted by this bug, whereas (I think) mokutil --import should.
I started a discussion there to see if they agree.
Reopened since MokUtil doesn't want to deal with boot entries - so I'll continue looking into solutions that could be applied at DKMS level (or elsewhere).
After further research, I've further narrowed down the scope of the issue, and am now more confident that this should not be addressed at the dkms or mokutil level. I've reopened discussions with the MAAS team and will continue tracking this in my original MAAS bug report, for any interested or impacted readers.