bees icon indicating copy to clipboard operation
bees copied to clipboard

GRUB: Extent not found after running bees

Open hotburger opened this issue 2 years ago • 23 comments

GRUB gives me this error after running bees for a few hours. It is consistently doing it every time I run bees for enough time. I fix it temporarily by reinstalling the kernel package from chroot. I'm assuming bees is deduping the kernel, which grub doesn't like? Strangely this doesn't happen to my arch install on the same partition.

I assume that gentoo is storing another copy of the kernel somewhere while arch doesn't. The only other difference from my arch install is a separate subvol for /boot.

grub error message:

Loading Linux 6.1.11-gentoo-dist ...
error: extent not found.
Loading initial ramdisk ...
error: you need to load the kernel first.

Press any key to continue...

hotburger avatar Feb 16 '23 19:02 hotburger

Some experiments to try to collect more information:

  1. Run btrfs-search-metadata file /path/to/vmlinuz (from python-btrfs package) before and after the failure (i.e. once after reinstalling, and once again when boot fails).
  2. Does it also fail when making a reflink of the kernel, e.g. cp --reflink=always /path/to/vmlinuz /root/foo and then reboot?

I don't know how grub would distinguish one reflink to a file from another, much less be fatally broken by it, so I expect experiment 2 will not trigger a grub failure, and we'll see some anomalous feature (non-zero extent offsets? unsupported compression type? hole in kernel file?) from experiment 1.

Hopefully we get some information that can be turned into an actionable grub bug report.

Zygo avatar Feb 18 '23 06:02 Zygo

This issue stopped happening for a while, so I couldn't replicate it to gather info. It is happening again though. Creating a reflink did not cause the boot to fail. vmlinuz-6.2.7-broken.log vmlinuz-6.2.7-fixed.log

hotburger avatar Mar 24 '23 17:03 hotburger

Looks like this is fixed in grub but not released yet:

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=7f4e017a1416bcbdca16de4f923679ec9f003171

Zygo avatar Mar 26 '23 22:03 Zygo

I had similar boot issues in versions of grub that supposedly have this fixed (it would panic in various random ways), I switched to a 3 partition layout with:

  • / btrfs
  • /boot ext4
  • /boot/efi vfat

Which works around the problems.

Seems like grub's btrfs implementation is not very good yet.

Jorropo avatar Aug 14 '23 02:08 Jorropo

I have the same problem on manjaro, using kernel 6.6.8-2-MANJARO and grub 2.12. Before entering the grub menu I get this error:

error: start_image() returned 0x800000000000000001.

Failed to boot both default and fallback entries.

Press any key to continue...

I can get into the grub menu after that, but trying to boot results in error: you need to load the kernel first. and the system freezes...

I am now successfully using @Jorropo's workaround

Trayshar avatar Dec 30 '23 00:12 Trayshar

I can confirm this on two separate machines running Arch. Here it is usually the amd-ucode.img that gets broken and gives the error: premature end of file. The systems boot if i remove it from the boot entry in Grub. Chrooting into the installation and reinstalling the ucode also fixes it temporarily.

PfannenHans avatar Feb 23 '24 08:02 PfannenHans

You can set the boot directory chattr +C before reinstalling the boot loader and see if that helps. bees won't touch file extents created with this flag on, IOW, setting the flag on already existing files changes nothing. New files will inherit the flag from the directory. But this also removes checksum protection from your boot files, so it can only work as a temporary work-around.

kakra avatar Feb 24 '24 03:02 kakra