Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

EDK2 202505 Firmware Crash when Booting Qemu Image

Open blitz opened this issue 5 months ago • 14 comments

Description

The Flatcar QEMU images fail to boot in Qemu or Cloud Hypervisor with EDK2 TianoCore 202505 (and probably 202502).

  Booting `Flatcar default'
                           
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000003  I:0 R:0 U:0 W:1 P:1 PK:0 SS:0 SGX:0
RIP  - 0000000063039E30, CS  - 0000000000000038, RFLAGS - 0000000000210046
RAX  - 0000000063046400, RCX - 0000000063046400, RDX - 0000000000000000
RBX  - 000000006D6BD798, RSP - 000000006ED4D548, RBP - 000000006E7EE018
RSI  - 0000000000000000, RDI - 00000000630800E8
R8   - 0000000000000000, R9  - 000000006AE301AC, R10 - 000000006D1FC184
R11  - 000000000000002D, R12 - 000000006D5CE000, R13 - 000000006D203AC0
R14  - 0000000000000001, R15 - 000000006D2101F0
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000063046400, CR3 - 000000006EA01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000006E7DE000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000006E4DC018 0000000000000FFF,   TR - 0000000000000018
FXSAVE_STATE - 000000006ED4D1A0
!!!! Find image based on IP(0x63039E30) (No PDB)  (ImageBase=0000000000BC8ED4, EntryPoint=0000000000BCFAC1) !!!!

Grabbing OVMF.fd from NixOS-unstable is sufficient to reproduce this issue.

I'm reasonably sure that this requires some Grub work, because the consensus on the EDK2 side seems to be that this is a grub bug.

Impact

Flatcar Linux doesn't boot in any Qemu/Cloud Hypervisor VM that uses the latest UEFI firmware.

Environment and steps to reproduce

  1. Set-up: Grab the latest flatcar qemu image on a Linux system. Install Qemu and latest TianoCore UEFI image (202505).
  2. Task: Boot Flatcar
  • qemu -machine q35,accel=kvm -cpu host -bios path-to-OVMF/FV/OVMF.fd -m 2048 -serial stdio -snapshot -hda ~/Downloads/flatcar_production_qemu_image.img
  1. Action(s): No action besides booting required.
  2. Error: See log above.

Expected behavior

The image boots with the latest UEFI firmware.

Additional information

  • Related discussion in the EDK2 repo about Grub getting things wrong when the Memory Attributes protocol is present: https://github.com/tianocore/edk2/issues/10918#issuecomment-2921823483
  • https://github.com/tianocore/edk2/issues/10883

blitz avatar Jul 21 '25 14:07 blitz

I still need to test it, but I wonder if we need to follow Fedora. They've only changed that in their Secure Boot builds though, and I'm guessing you're not using Secure Boot.

chewi avatar Jul 21 '25 15:07 chewi

Oh wait, tianocore/edk2#10883 already leads to the introduction of PcdUninstallMemAttrProtocol, where the workaround is to set that to TRUE, not FALSE. We're already doing that... but you're not using our builds. Testing time!

chewi avatar Jul 21 '25 15:07 chewi

This is a good overview ticket: https://issues.redhat.com/browse/RHEL-75263

I'll try to disable gUefiOvmfPkgTokenSpaceGuid.PcdUninstallMemAttrProtocol and see whether this works.

blitz avatar Jul 21 '25 15:07 blitz

Hmm, that's not all. Although we have been setting PcdUninstallMemAttrProtocol = TRUE, it seems that requires a patch from Fedora that we haven't been applying. Woops.

chewi avatar Jul 21 '25 15:07 chewi

Reproduced with my local bump of Gentoo's edk2 package to 202505.

chewi avatar Jul 21 '25 15:07 chewi

Applying Fedora's patch together with PcdUninstallMemAttrProtocol = TRUE fixes it. I need to see whether it still needs to be TRUE with Secure Boot or not. If Fedora took this approach rather than fixing GRUB, that's what I'm going to do too. I try my best with this stuff, but I'm not an expert.

chewi avatar Jul 21 '25 15:07 chewi

Also reproduced with 202502, which I find surprising because that's been in Gentoo for a while and no one has reported this issue. I suppose that might be because almost everyone will have it because of QEMU, which is pinned back to 202202 by default.

chewi avatar Jul 21 '25 15:07 chewi

So the consensus seems to be that red hat's grub patch set does weird things. But given that this is basically now the new grub upstream there is little to be done about it but to wait until they got their stuff together.

Thanks for looking into this @chewi ! Much appreciated.

blitz avatar Jul 21 '25 16:07 blitz

Ah, good point, Flatcar uses that patch set, and I hadn't considered that factor. That helps explain why no Gentoo users have complained, although I imagine some of them must run Fedora/RHEL VMs.

chewi avatar Jul 21 '25 16:07 chewi

It does still need to be TRUE with Secure Boot on, so I'm wondering why Fedora has changed it to FALSE, although you can also request it at runtime by passing -fw_cfg opt/org.tianocore/UninstallMemAttrProtocol,string=y to QEMU.

chewi avatar Jul 21 '25 17:07 chewi

I have now prepared 202505 for Gentoo with the fix so that Flatcar can eventually pick it up, but I'm holding back from pushing it right now because of a licensing issue.

chewi avatar Jul 22 '25 15:07 chewi

@chewi This fedora patch is not up streamed. Do you know what is the Fedora's fix, will it be inside fedora image or we should carry the patch ourselves. I am hitting the same issue with Cloud-hypervisor with Fedroa image

russell-islam avatar Sep 02 '25 19:09 russell-islam

I mostly have to guess Fedora's intentions from what goes into https://src.fedoraproject.org/rpms/edk2, but I've been confused by this change. I must therefore defer to @kraxel for an explanation.

chewi avatar Sep 03 '25 09:09 chewi

On the licensing issue, the initial feedback has been promising, but I haven't heard anything for a little while. Lawyers move at their own pace!

chewi avatar Sep 03 '25 09:09 chewi