run_qemu icon indicating copy to clipboard operation
run_qemu copied to clipboard

mkosi v15+: mkosi.extra/boot/ files missing in /boot, breaks incremental update_existing_rootfs()

Open marc-hb opened this issue 1 year ago • 10 comments

update_existing_rootfs() currently relies on /boot/System.map-N.M being located on the main partition. When it's not, the "incremental" build fails like this:

not found: ./qbuild/mnt/boot/System.map-6.12.0. Try rebuilding with '-r img'

The -r img workaround is correct but obviously much slower.

Note there are multiple places where the ESP partition can be mounted: notably /efi or /boot. Fedora+mkosi seems to always use /efi by default?

https://wiki.archlinux.org/title/EFI_system_partition#Typical_mount_points

cc:

  • #75

marc-hb avatar Dec 18 '24 05:12 marc-hb

I can reproduce as early as mkosi v15. This was likely caused by the v15 switch to systemd-repart, see giant commit

https://github.com/systemd/mkosi/commit/8bbbd836078a2 "Migrate disk image building to systemd-repart"

Because we don't know up-front anymore where the ESP partition will be mounted, all boot loader files are installed to /boot. So to populate an ESP partition, you'd use "CopyFiles=/boot:/" in the partition definition file of the ESP partition.

marc-hb avatar Dec 18 '24 06:12 marc-hb

I can reproduce as early as mkosi v15.

Correction: with mkosi v22, /boot/System.map-6.12.0 and friends land in the ESP partition.

With mkosi v15, they land NOWHERE!

marc-hb avatar Dec 18 '24 06:12 marc-hb

~I think we just need a systemd-repart configuration~. It felt great to avoid an explicit partition table and just rely entirely on mkosi defaults but that's just too "volatile" and unpredictable for something like update_existing_rootfs(). Even if update_existing_rootfs() could get smarter and dynamically adjust its System.map logic now to various partition schemes, it would break again somewhere else or for some other, random mkosi version. So let's just bite the systemd-repart configuration bullet. I took a look and it does not look like rocket science. Also, it's still possible to leave a lot of things as default in such a configuration.

EDIT cc:

  • https://github.com/systemd/mkosi/issues/3948

marc-hb avatar Dec 18 '24 19:12 marc-hb

I think we just need a systemd-repart configuration.

... or maybe not. Maybe that's not required after all... Change of mind.

One burning question is: what is the -F System.map argument trying to achieve? It came with the addition of the depmod invocation in commit 2ed0ed3af4fa2f3ec. man depmod says:

       -F, --filesyms System.map
           Supplied with the System.map produced when the kernel was built, this allows the -e
           option to report unresolved symbols. This option is mutually incompatible with -E.

But -e is not currently used! So, -F does nothing at all ?

Also: when invoked by update_existing_rootfs(), setup_depmod() seem to look at the OLD System.map file? This re-enforces the suspicion that it does nothing :-D

Could this -F be another instance of trying to port to mkosi v15+ another update_existing_rootfs() feature that never actually worked with v14- in the first place? Like #76. If yes then let's just (temporarily) delete it to unblock the migration to v15+

Generally speaking, porting to mkosi v15+ is really hard without a clear picture of what: 1) code was supposed to do with mkosi v14- in the first place 2) what it was actually achieving with v14-.

Other complications: The kernel and the initrd live in potentially 3 different places. Even with a fresh build from scratch, all these have a different initrd file :-(

Status with mkosi v14- and Fedora 40 (v15+ has significant differences)

  • mkosi.extra/usr/lib/modules/6.12.0-dirty/vmlinuz # used when booting with --direct-kernel = the default option
  • mkosi.extra/boot/vmlinuz-6.12.0-dirty # yet another duplicate, yeah! Staging for /boot/
  • ESP partition # usually mounted at /efi, used when booting with --no-direct-kernel
    • 5248fff44e974fce9cc88b89875eb063/6.12.0/linux # usual bzImage. This copy is NOT updated by the update_existing_rootfs() shortcut. Gone or moved with v15+
    • EFI/Linux/linux.efi # copy of the above. systemd-boot default. NOT actually a UKI! Not even an .EFI binary! This generates a bootctl warning. Created and updated by update_rootfs_boot_kernel(): still there with v15+ (with a slightly different name) and still the systemd-boot default. Fixes and renames submitted in #98
    • EFI/Linux/mkosi-fedora-6.12.efi # all-in-one UKI with initrd included. Unreliable with mkosi v14? Can be just ignored.
  • /boot on the root partition: the usual vmlinuz+initrd with v14- thanks to install_build_initrd() / make_install_kernel(); EMPTY with mkosi v15+!! Never used at boot time, only at later modprobe time? vmlinuz does get updated by update_existing_rootfs()

The situation with modules is similar but even more varied because in addition to being embedded in initrd files, modules are also in /lib/modules/. Business as usual.

marc-hb avatar Dec 18 '24 22:12 marc-hb

Simply dropping the -F System.map argument is enough to build and boot with mkosi v15 (EDIT: and with many other mkosi versions) https://github.com/pmem/run_qemu/actions/runs/12402741008/job/34624880942?pr=90

@stellarhopper , @weiny2 could you test that -F System.map drop more extensively? I mean with some actual kernel and module changes...

--- a/run_qemu.sh
+++ b/run_qemu.sh
@@ -1037,11 +1037,11 @@ setup_depmod()
        fi
        if [ ! -f "$system_map" ]; then
                echo "not found: $system_map. Try rebuilding with '-r img'"
-               return 1
+               # return 1
        fi
        : Warning: symlinks created by this depmod dont survive the move
        : to the virtual machine
-       sudo depmod -b "$prefix" -F "$system_map" -C "$depmod_dir" "$kver"
+       sudo depmod -b "$prefix"                  -C "$depmod_dir" "$kver"
 }
 

marc-hb avatar Dec 19 '24 00:12 marc-hb

I did a lot more testing and dropping "-F System.map" is not good enough. It's just shooting the messenger. It's a "also guilty" messenger but still just a messenger. Dropping "-F System.map" fixes the build but hides a bigger missing /boot problem.

Here's the situation with mkosi v15+ if we drop "-F System.map"

  1. run_qemu.sh from scratch; invokes mkosi: /boot/ is totally empty
  2. run_qemu.sh not from scratch: mkosi not used, update_init_rootfs() run instead: /boot/ has the latest vmlinuz

The above tested with both v15 and v23.

I think it's better to fail with this "system.map" error message because it can lead people to this bug and issue until the real /boot/ problem is actually fixed rather than silently give them an empty and then mostly empty /boot/ while pretending everything looks fine.

marc-hb avatar Dec 20 '24 22:12 marc-hb

Hm, didn't mean to close this - I guess it auto-closed because of the mention in #98

stellarhopper avatar Jan 03 '25 22:01 stellarhopper

I guess it auto-closed because of the mention in https://github.com/pmem/run_qemu/pull/98

Most likely yes, please upvote https://github.com/orgs/community/discussions/17308 (and duplicates...)

marc-hb avatar Jan 03 '25 23:01 marc-hb

Dropping "-F System.map" fixes the build but hides a bigger missing /boot problem.

So the key question is: does anyone or anything uses /boot?

/boot inside the image is used by neither --direct-kernel nor by --no-direct-kernel right now. The former uses the kernel and initrd outside the image. The latter uses the /efi partition.

Maybe /boot/ was used in older, GRUB times but not anymore now? @stellarhopper, @weiny2 , any memories?

If /boot is not used or not used anymore, then we can drop /boot entirely, point -F System.map somewhere else and the problem should be solved!

marc-hb avatar Jan 04 '25 00:01 marc-hb

@marc-hb yeah I'm pretty sure this is true - /boot is just a holdover from grub days, and likely can be removed now.

stellarhopper avatar Jan 04 '25 00:01 stellarhopper

update_existing_rootfs() currently relies on /boot/System.map-N.M being located on the main partition. When it's not, the "incremental" build fails like this:... I can reproduce as early as mkosi v15.

I don't understand how this stopped being an issue. Was it Fedora 40 specific? I'm not using Fedora much these days.

EDIT: right now -r img has an empty /boot which is not an issue, while -r img has a non-empty /boot/ which works too. I can't remember when this was failing and why.

That does not mean /boot/ is useful now. Maybe it still isn't. But the build does not fail...

marc-hb avatar Nov 22 '25 00:11 marc-hb