zfs icon indicating copy to clipboard operation
zfs copied to clipboard

System deadlock during Ubuntu upgrade 24.10→25.04 with ZFS root due to update-grub calling `ls /.zfs/snapshot/…` via `10_linux_zfs` causing kernel lock at mount.zfs

Open bentolor opened this issue 7 months ago • 38 comments

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.10
Kernel Version 6.11.0-25-generic
Architecture x86_64
OpenZFS Version zfs-2.2.6-1ubuntu1.1, zfs-kmod-2.2.6-1ubuntu1

Describe the problem you're observing

  • I've failed now 7 time to upgrade my Ubuntu 24.10 to 25.04 (Beta, RC, First Release, Today). In all cases the upgrade runs into a complete system freeze deadlock. zfs rollback for the rescue.
  • Today, only using text console with screen & running dmesg -Hxw, htop -d 5 and do-release-upgrade in parallel I finally was able to pinpoint the problem down to ZoL
  • This is, what I reckon happens:
    1. do-release-upgrade downloads & install all updated .deb packages
    2. Finally, the upgrade tries to run upgrade-grub
    3. The grub script executes the hook 10_linux_zfs to identify the available kernel versions for the grub boot menu
    4. As part of this discovery an ls /.zfs/snapshot/[snapshot]/etc is executed which causes a system freeze
    5. Eventually, after a long pause the kernel reports task ls:… blocked for more than 122 seconds.
    6. Only option left for me is to shutdown server and revert to ZFS snapshot I did before the upgrade

As the ls statement in question works flawlessly before the upgrade process, I assume the hang is caused due to ZoL been replaced by apt previously?

Describe how to reproduce the problem

Try to upgrade Ubuntu 24.10 to 25.04 on a system with ZFS root and existing ZFS snapshots.

Include any warning/errors/backtraces from the system logs

Frozen htop output at system freeze

Image

do-release-upgrade output with closing kernel warning

Image

Kernel output reporting blocked zfs.mount due to ls

Image

Things I tried

  • Deleting the snapshot in question. It seems any snapshot causes the issue
  • Shutting down any services
  • Rerunning several times; even waiting weeks between retries
  • Executing the ls /.zfs/snapshot/[snapshot]/etc successfully before upgrade
  • Executing update-grub successfully before upgrade
  • Rebooting into the partial upgraded system an unsuccessfully running strace update-grub: It hangs then.

Nota bene: I also reported this issue in Ubuntu launchpad for the do-release-upgrader package https://bugs.launchpad.net/ubuntu/+source/ubuntu-release-upgrader/+bug/2110891

bentolor avatar May 15 '25 10:05 bentolor

Did the system install or did it lock up completely after reboot?

For me i got the same error: Couldn't find any valid initrd for dataset rpool/ROOT/ when i did the upgrade. The install freeze and did not change for about 20min so i did a hard reset.

My system is updated to 25.04 but gnome is not installed and an error with the white screen and i have to use ssh to get into the machine but when i try to update or install gnome i get the same error again. Looks a bit like the update issue that delayed the release.

Is the new kernel bundled with ubuntu 25.04 not compatible with zfs? Oh i wish so much now that i had not installed zfs on the bootdrive and not zfs....I have had more issues with kernels and zfs with other machines. Zfs and linux is not great, its like nvidia and linux with the same issues with linux. I need to copy over all information somehow to another disk and scrap zfs on the boot drive as i am an idiot not to do the manatory snapshot of the system just before upgrading.

Exterior466 avatar May 15 '25 14:05 Exterior466

For me i got the same error: Couldn't find any valid initrd for dataset rpool/ROOT/ when i did the upgrade. The install freeze and did not change for about 20min so i did a hard reset.

Yes, exactly. I had this effect 7 times and always reverted to my pre-upgrade snapshot until I was able today to pointpoint the problem. My first upgrade runs were started from the graphical environment and had the random effect of some desktop function still working, but no terminal input, no new SSH session, etc.

Did the system install or did it lock up completely after reboot?

@Exterior466 This lock-up is mid-flight of install and obviously a few more some steps would need to follow later. In my first lockup I manually ried to re-start / complete the installation but eventually was confronted with more lockups (and a lack of idea how fix & do the missing tasks).

Right now I have no idea how to transition to a working 25.04 install starting from my running 24.10 environment.

Is the new kernel bundled with ubuntu 25.04 not compatible with zfs?

ZFS is officially supported by Ubuntu. I rather think that the on-disk upgrade to 25.04 ZoL libraries with the 24.10 kernel still running might be the culprit. And as neither the 25.04 initrd nor the 25.04 grub entries are present, the situation remains even after I reboot into the borked, half-upgraded system.

Zfs and linux is not great, its like nvidia and linux with the same issues with linux.

The license incompatibility is indeed a very unfortunate issue.

Oh i wish so much now that i had not installed zfs on the bootdrive and not zfs...

After my disastrous experiences with BTRFS as root (and ZFS for /srv) I do not regret the switch. I want a file system which can cope bitrot and handle redundancy. My major ZFS pain point is mostly only performance. This is my first real functional pain point with ZFS / ZoL after a decade (?)

…as i am an idiot not to do the manatory snapshot of the system just before upgrading.

I typically do.
I didn't do this time in my 1st run!
zfs-auto-snapshot again (one more time) got me covered!

bentolor avatar May 15 '25 14:05 bentolor

sudo dpkg --configure -a stalls in ssh terminal. I thought the blob for the xone was the issue but removing it made no difference. Kernel 6.11 is used and give me Tainted: P OE hung_task_timeout when i reboot. Do you remember how did you got pass this?

I try to get a log of what package that makes the reconfiguring to stall but i cannot get the log as i have to interrupt it as it freezes.

sudo dpkg --audit shows all the packages is not configured and that is a long list.. I guess that is the very last step in the update that was interrupted as the new kernel was not configured with zfs but mostly it seems the system is updated except the kernel.

I like zfs on my other drives but to use it on root drive is tougher in situations like this. I waited until the upgrade issues were fixed but i guess zfs was forgotten. Too few of us running zfs on the bootdrive to report this bug before the upgrade were re-released.

I like zfs for the reasons you state, but its not worth it for me on the bootdrive, if my other drives not work it's ok as i still have a working system.

sudo dpkg --configure -a does not work and ssh terminal freeze. I thought the blob for the xone was the issue but removing it made no difference.

Its funny that the system runs as usual with casaos and steam start, but not in fullscreen. The minimal desktop with the white screen "something went wrong" with a cross for mouse cursor. I guess it is gnome and that the new kernel is not updated and in use that makes gnome to fail. If i could just get past the dpkg --configure i might be able to salvage the install.

Exterior466 avatar May 15 '25 16:05 Exterior466

Thanks for taking time and describing the issue @bentolor. I ran into exactly the same problem just now when I tried to upgrade from 24.10 to 25.04 on ZFS root.

I wonder if this issue needs to be addressed by openzfs or ubuntu.

iSOcH avatar May 15 '25 21:05 iSOcH

I managed to update the system by doing this:

Backup the update-grub script sudo cp /usr/sbin/update-grub /usr/sbin/update-grub.bak

Rename the update-grub script to prevent it from being executed. When i update the grub update started and froze my ssh login. sudo mv /usr/sbin/update-grub /usr/sbin/update-grub.disabled

Update system sudo apt update && sudo apt upgrade -y

Restore the update-grub script sudo mv /usr/sbin/update-grub.disabled /usr/sbin/update-grub

Something is still wrong and i cannot get grub to update properly so ubuntu boot with kernel 6.11.0-25 instead of 6.14. Havent tried system properly yet and i need to find a fix for the broken grub update. This seems to be the big culprit.

Exterior466 avatar May 15 '25 21:05 Exterior466

@iSOcH Thanks for reporting in!

I wonder if this issue needs to be addressed by openzfs or ubuntu.

Yet we don't know. Until we understand this, please help raising awareness on Ubuntu-side via "This bug affects me" on the linked launchpad bug as well to increase bug "heat".

@Exterior466 I assume the issue is, because 25.10 zfs.mount is installed, but 24.10 kernel is running. Personally I'd go the route and boot from the 25.04 installer CD/USB and chroot into my installation to run update-initramfs -u and update-grub directly from a 25.04 kernel, in case I would lack the option to restore my working 24.10 state.

bentolor avatar May 15 '25 21:05 bentolor

Issue might be related to https://github.com/openzfs/zfs/issues/17252#issuecomment-2812540940

bentolor avatar May 15 '25 21:05 bentolor

System is running as it should for me after doing what i did, and it was to bypass the grub update that broke the update. Grub seems to be changed in 25.04 so something is not working correctly with zfs.

I am trying to fix grub so the new kernel will boot and might be live usb be the way to do that as you suggest bentolor. I done it loong time ago with ubuntu but as it seems grub is doing something odd that might be the way to go.....however, is it possible to get zfs running on an ubuntu live?

Exterior466 avatar May 15 '25 21:05 Exterior466

… is it possible to get zfs running on an ubuntu live?

@Exterior466 as written above: Ubuntu supports ZFS. It should work out-of-the-box

Grub seems to be changed in 25.04

No. That's not the issue. See linked issue. What you could try is to temporarily remove /etc/grub.d/10_linux_zfs and see if this allows you to complete the 6.14 kernel installation of Ubuntu 25.04 instead the more complicated chroot route.

Let's try to stick to the OpenZFS aspect here in this tracker.

bentolor avatar May 15 '25 21:05 bentolor

I moved 10_linux_zfs temoporarly and ran upgrade and it worked and now grub show up with 6.14 kernel but throws me into busybox. Same with 6.11 kernel so now my system is offline. Most annoying part is that both kernels are affected, and i suspect it is grub in this instance. It seems the completion of fixing 6.14 rebuild 6.11 before writing to grub. Zfs is loaded and works in busybox so most seem ok. Is it grub that need to be modified in some way? When removing 10_linux_zfs temporaraly and then putting it back after successful install of 6.14 it probably need to be installed again with 10_linux_zfs?

I get the error message at boot: record fail and load_video and then busybox is started with error message in busybox is that no pool imported.

Not a wiz with busybox but tried what is stated zpool import rpool -N and same with bpool with different order and it seem to work, but when i exit i get kernel panic.

I have also tried to use live usb and followed this: https://develmonk.com/2022/05/20/mount-ubuntu-22-04-zfs-partitions-using-live-iso-for-disaster-recovery/ But i couldn't get it to work, might be something different with 25.04 and i don't seem to get the folders to mount correctly . Will try again...

Exterior466 avatar May 15 '25 23:05 Exterior466

Having the same issue. I added a few tracing lines to /etc/grub.d/10_linux_zfs and it seems that the root filesystem is mounted, unmounted, then remounted; the remount hangs: Image

Edit: the mount process itself returns, then the system hangs a few seconds later.

astralblue avatar May 16 '25 01:05 astralblue

Found a workaround:

  1. After the lockup, force-reboot. This reboot uses the old kernel that locks up upon mount.zfs.
  2. Run sudo patch -N /etc/grub.d/10_linux_zfs < 10_linux_zfs.diff so that it ignores snapshots in validate_system_dataset and get_system_directory functions.
  3. Run sudo dpkg --configure -a. This runs update-grub2 multiple times, which picks up and makes the new kernel the default, while avoiding lockups caused by mounting and probing snapshots.
  4. Reboot using the new kernel. (The new kernel doesn't lock up upon mount.zfs.)
  5. Undo the patch (sudo patch -R /etc/grub.d/10_linux_zfs < 10_linux_zfs.diff).
  6. Run sudo apt autoremove to finish removing unused packages in Ubuntu 25.04. During this phase, update-grub2 runs and picks up all the snapshots.

The 10_linux_zfs.diff patch:

--- 10_linux_zfs        2025-05-15 21:03:33.610925840 -0700
+++ 10_linux_zfs.new    2025-05-15 22:15:32.700304647 -0700
@@ -125,6 +125,11 @@
         return
     fi

+    if [ -n "${snapshot_name}" ]; then
+        grub_warn "Ignoring snapshot '${dataset}@${snapshot_name}'."
+        return
+    fi
+
     if ! mount -o noatime,zfsutil -t zfs "${dataset}" "${mount_path}"; then
         grub_warn "Failed to find a valid directory '${directory}' for dataset '${dataset}@${snapshot_name}'. Ignoring"
         return
@@ -196,6 +201,8 @@
     local snapshot_name=""
     # For snapshots we extract the parent dataset
     if echo "${dataset_path}" | grep -q '@'; then
+        grub_warn "Ignoring snapshot '${dataset_path}'."
+        return
         base_dataset_path=$(echo "${dataset_path}" | cut -d '@' -f1)
         snapshot_name=$(echo "${dataset_path}" | cut -d '@' -f2)
     fi

astralblue avatar May 16 '25 05:05 astralblue

I tried again to follow https://develmonk.com/2022/05/20/mount-ubuntu-22-04-zfs-partitions-using-live-iso-for-disaster-recovery/ and this step in the guide: mount -v --bind /mnt/boot/efi/grub /mnt/boot/grub didn't make sense so i skipped that. After sudo chroot /mnt

done:

apt remove --purge zfs-dkms zfsutils-linux and again apt install zfs-dkms zfsutils-linux

The warning that shows up is that openzfs with 6.14.0-15 is expermental, as in it is not supported. Deprecated feature: REMAKE_INITRD (/etc/dkms/zfs.conf)

then i ran: update-initramfs -uvk all and all looks correct.

then: sudo update-grub

got the warning didn't find any valid initrd or kernel......

same error of cant find command record fail and load_video.

Time to remove 6.14 and stick with 6.11 and see if that works:

nope....

I hope someone find a solution!

Exterior466 avatar May 16 '25 18:05 Exterior466

BTW, in case your grub.cfg is already messed up (so Ubuntu doesn't show up, or it fails to boot into the new kernel because of missing or botched initrd) and you need to boot into a working system running the old kernel:

  1. From the Grub menu, hit c to drop into Grub2 command line.

  2. Find your ZFS boot dataset and the old kernel/initrd.img in it, using ls command and Tab completion.

    ls
    ls (hd1,gpt1)/BO<Tab>        # no go
    ls (hd1,gpt2)/BO<Tab>        # no go
    ls (hd1,gpt3)/BO<Tab>        # yes
    ls (hd1,gpt3)/BOOT/ubu<Tab>  # picked up the dataset
    ls (hd1,gpt3)/BOOT/ubuntu_XXXXXX/@/ # shows the kernels and initrd images
    
  3. Load the previous kernel/initrd and boot into them:

    linux (hd1,gpt3)/BOOT/ubuntu_XXXXXX/@/vmlinuz-6.11.0-25-generic boot=ZFS=rpool/ROOT/ubuntu_XXXXXX
    initrd (hd1,gpt3)/BOOT/ubuntu_XXXXXX/@/initrd.img-6.11.0-25-generic
    boot
    

    Don't forget the root=ZFS=... at the end of the linux line. By convention, the root dataset often has the same name as the boot dataset but starting with rpool/ROOT instead of bpool/BOOT. YMMV though.

    Also, if you want to boot into recovery shell, add single at the end of the linux line. (It'll ask for the root password if there's one set.)

This works for both unencrypted and ZFS-native-encrypted root. LUKS setup will probably need more; I don't use it so don't know the exact steps.

astralblue avatar May 16 '25 19:05 astralblue

I can confirm that this problem hit me as well, on several machines. Interestingly it also happens WITH NO SNAPSHOTS! I confirmed that on a virtual machine where I can rollback the complete machine state.

astralblues patch trick helped to resolve the situation. Thank you very much for this!!!

tobalur avatar May 17 '25 05:05 tobalur

I found snapshots that was 2 weeks old and rolled back and after some extra work my old system is up running again. I will try astralblues patch trick as i want the ubuntu update, but will not forget to take snapshots this time!

Quick question does the patch need to be changed first to adapt to my system or to run as is?

Exterior466 avatar May 17 '25 15:05 Exterior466

@Exterior466 I'd recommend to refrain from starting any upgrade right now, until we get at least some feedback from any upstream channel. 24.10 is supported till 2025-07.

If I find some time I aim to hack some evolved script based on @astralblue work (i.e. a 00_zfs_upgradehack for /etc/grub.d/) to hotpatch 10_linux_zfs right during the upgrade process to allow a full upgrade avoiding the need to manually trying to fix a half-finished upgrade.

bentolor avatar May 19 '25 08:05 bentolor

Issue might be related to #17252 (comment)

I can confirm this - I've hit this while updating a regular Ubuntu 24.04 using zabbly zfs / kernel - userland was upgraded to 2.3.0 while kmod was 2.2.6 or 2.2.7 - it hang during a update-grub during dpkg --configure -a - I suspect this is not directly related to upgrading Ubuntu or grub but hitting that bug. The machine doesn't use grub for booting so forcefully removing grub helped with fixing this.

mtippmann avatar May 19 '25 13:05 mtippmann

Ok, based on the initial work from @astralblue, I've created a really ugly hack solution which allowed me to upgrade without errors on deadlocks. Sharing below, so that others can successfully upgrade their Ubuntu 24.10 installation, in case they use root on ZFS and they are still on 24.10 or were able to restore their 24.10 installation after a failed upgrade attempt (cc @Exterior466):

With root permissions:

  1. Update your system: apt update ; apt upgrade --with-new-pkg
  2. Download 00_zfs_oneiric_upgradehack.cfg to /etc/default/grub.d/ (not /etc/grub.d/. Also keep the filename incl. .cfg suffix)
  3. Prepare for the worst, step 1: cp -ra /boot /boot.oracular
  4. Prepare for the worst, step 2: zfs snapshot -r rpool@oracular
  5. I had issues with virtualbox DKMS; maybe temporarily uninstall if you have it: apt remove virtualbox-dkms
  6. Run the upgrade – it now should pass through successfully do-release-upgrade
  7. Reboot into Ubuntu 25.04 after upgrade completion
  8. Remove the hack: rm /etc/default/grub.d/00_zfs_oneiric_upgradehack.cfg*
  9. Run update-grub twice: update-grub ; update-grub
  10. Reboot. You should be done.

Save this as 00_zfs_oneiric_upgradehack.cfg to /etc/default/grub.d/

#!/bin/dash
set -e  # Exit on any unexpected error unless handled

KERNEL_VERSION=$(uname -r)
PATCH_FAILED=0
UPGRADE_SUCCESS=0
PATCH_FILE="/etc/grub.d/10_linux_zfs"
BACKUP_FILE="/root/10_linux_zfs.zfs_oneiric_upgradehack"

# 0. Check Kernel Version
case "$KERNEL_VERSION" in
   6.14.*) echo "Upgraded kernel detected. Self-disabling!"; UPGRADE_SUCCESS=1 ;;
   *)
    echo "Current kernel version is '$KERNEL_VERSION'."
esac

# 0. Sanity check & automatic disabling after upgrade
if [ -f "$BACKUP_FILE" ]; then
    echo "Backup file '$BACKUP_FILE' exists!"
    if [ "$UPGRADE_SUCCESS" -ne 0 ]; then
        echo "Ubuntu 25.04 kernel seems to run!"
        mv "$BACKUP_FILE" "$PATCH_FILE"
        echo "Restored backup '$BACKUP_FILE' as '$PATCH_FILE'"
        mv  /etc/default/grub.d/00_zfs_oneiric_upgradehack.cfg /etc/default/grub.d/00_zfs_oneiric_upgradehack.cfg.disabled
        echo "Disabled this script by renaming it to /etc/default/grub.d/00_zfs_oneiric_upgradehack.cfg.disabled"
        exit 0
    else
        echo "...but upgrade yet not successful? Skipping..."
        return
    fi
fi

# 1. Check Kernel Version
case "$KERNEL_VERSION" in
   6.11.*) ;;
   *)
    echo "Current kernel version is '$KERNEL_VERSION'."
    echo "This script only supports kernel versions starting with 6.11."
    echo "If you have already successfully upgraded to Ubuntu 25.04, remove "
    echo "    /etc/default/grub.d/00_zfs_oneiric_upgradehack.cfg"
    echo "    /etc/default/grub.d/99_zfs_oneiric_upgradehack.cfg"
    exit 1
esac

echo "Kernel version is $KERNEL_VERSION. Applying hotpatch..."

echo "\nIMPORTANT:\n   Watchout - this is only a dirty one-time hack!\n"
echo "   You should see later a log line restoring the backup file made above."
echo "   After you succesfully booted into Ubuntu 25.04, delete the two files"
echo "   you added and re-run update-grub to restore the ZFS boot entries.\n\n"


# 2. Backup the file
if [ ! -f "$PATCH_FILE" ]; then
    echo "Error: Target file '$PATCH_FILE' is missing?"
    exit 1
fi
cp -a "$PATCH_FILE" "$BACKUP_FILE"
echo "Backup created at '$BACKUP_FILE'."

# 3. Apply the patch using `patch`
echo "Applying patch to (temporarily) skip ZFS user space calls..."
patch --no-backup-if-mismatch "$PATCH_FILE" <<'EOF'|| PATCH_FAILED=1
--- 10_linux_zfs        2025-05-15 21:03:33.610925840 -0700
+++ 10_linux_zfs.new    2025-05-15 22:15:32.700304647 -0700
@@ -125,6 +125,11 @@
         return
     fi

+    if [ -n "${snapshot_name}" ]; then
+        grub_warn "Ignoring snapshot '${dataset}@${snapshot_name}'."
+        return
+    fi
+
     if ! mount -o noatime,zfsutil -t zfs "${dataset}" "${mount_path}"; then
         grub_warn "Failed to find a valid directory '${directory}' for dataset '${dataset}@${snapshot_name}'. Ignoring"
         return
@@ -196,6 +201,8 @@
     local snapshot_name=""
     # For snapshots we extract the parent dataset
     if echo "${dataset_path}" | grep -q '@'; then
+        grub_warn "Ignoring snapshot '${dataset_path}'."
+        return
         base_dataset_path=$(echo "${dataset_path}" | cut -d '@' -f1)
         snapshot_name=$(echo "${dataset_path}" | cut -d '@' -f2)
     fi
EOF

# 4. Error handling and user confirmation
if [ "$PATCH_FAILED" -ne 0 ]; then
    echo "\nERROR APPLYING PATCH. PLEASE CHECK THE FILE MANUALLY.\n"
    echo " !!! If you are currently running a Ubuntu dist-upgrade from 24.10. to 25.04.,"
    echo " !!! You should now open a terminal and try to resolve/fix the failed, temporary"
    echo " !!! patching of the file $PATCH_FILE, so that the currently running update-grub"
    echo " !!! will not cause a kernel deadlock.\n"
    while true; do
        echo "Choosing N will restore the backup and interrupt update-grub with an error code."
        printf "Continue? (y/n): "
        read user_input
        case "$user_input" in
            [yY]) break ;;
            [nN]) mv "$BACKUP_FILE" "$PATCH_FILE"; rm "$PATCH_FILE.rej" ; echo "Restored backup & exiting."; exit 1 ;;
            *) echo "Please enter y or n." ;;
        esac
    done
else
    echo "Patch applied successfully."
fi

bentolor avatar May 20 '25 07:05 bentolor

I'm having the same issue with sudo dpkg --configure -a freezing the terminal and never finishing the command. It doesn't matter if I ssh in or run it from another tty. The result is the same.

The upgrade from 24.10 > 25.04 froze for me, forcing me to reboot. I can load into the GNOME desktop, but I can't update the system. When rebooting, I get the same Tainted and hung_task lines related to ZFS.

tehmasterer avatar May 20 '25 20:05 tehmasterer

Thanks @bentolor this solution! I will try it out tomorrow and see if it works for me as well! This time not forgetting to snapshot after stopping all my running containers. The suggestion to wait is probably the best option if you are not asking for punishment, but i do have some time and i want my system running as best as possible for gaming

One funny thing i realized after i restored the system was that my bpool partition wasnt mounted (as it was mounted on another system...) but the system still run pretty good until it kernal paniced. That it even work without bpool feels strange.....And why it took a bit of time before i realized that as my kernel tinkering didn't work.

Another thing was that my changing the kernels before i restored the system was still in effect after restoration and the kernel wasn't properly installed. As i restored both bpool and rpool from the same date it was weird that my kernel changes was still there. Things you learn i guess....

@tehmasterer , This is what happens, you have to hard reset as the update get stuck on the last steps. I would restore the system with the snapshop you hopefully did right before updating and trying Bentolor's solution or as he suggests to wait some time so this issue will be fixed for everyone.

What i did first as i described above did make the system run fine with the old kernel but i wouldn't suggest doing that as the future kernel updates will most probably be broken.

Exterior466 avatar May 20 '25 22:05 Exterior466

I was able to finish my broken upgrade with the additional lines in /etc/grub.d/10_linux_zfs posted by @astralblue . Thank you very much!

kleini avatar May 21 '25 07:05 kleini

@bentolor I used your hack from https://github.com/openzfs/zfs/issues/17337#issuecomment-2893283104 for upgrading

it worked for me, except that I had to manually restore /etc/grub.d/10_linux_zfs. Since the upgrade I am affected by https://bugs.launchpad.net/ubuntu/+source/zsys/+bug/2106501 (but not entirely sure what the impact of that is)

iSOcH avatar May 23 '25 17:05 iSOcH

So many ...different... ways listed to modify your stuff to make it work... I don't like modifying system scripting or tooling that should theoretically not need touching.

Here's how to do it without modifying a single thing in the scripts - only grub.cfg itself [in a small way]. And without having a backup or a snapshot to roll back to. You simply need to make the initrd and make grub boot the newly installed kernel with said initrd that contains the matching ZFS KMOD version to work alongside the upgraded userland that was just installed by the upgrade process. And then you let dpkg finish the install configuration step where it left off. The update grub procedure will no longer fail.

Steps:

Let it hang in the middle of installation. It's in the end stages configuring all that was just installed.

Hard reboot into the previous kernel.

Manually run update-initramfs to properly create the initrd for 6.14 which is not finished in the install process when the module and userland version mismatch causes the lockup.

My full command used was

update-initramfs -v -c -k 6.14.0-15-generic

If you look in /boot you will now have both a kernel and initrd.img for 6.14. these have been properly symlinked to vmlinuz and initrd.img [without any versioning in the file name]

Manually edit /boot/grub/grub.cfg modifying the 'kernel' and 'initrd' lines of the first listed OS in grub.cfg which should be 24.10 and kernel 6.11. change these lines to point to the above mentioned symlinks so you don't need to type the whole version of both files twice. I mean, you could if you wanted to...

Mine looked like

linux "/BOOT/ubuntu_fu8mf4@/vmlinuz" [keep all parameters like root=ZFS=whatever]
initrd "/BOOT/ubuntu_fu8mf4@/initrd.img"

Reboot and you will now be in kernel 6.14 with a proper initrd that includes the appropriate version ZFS kmod.

Finish the install with dpkg --configure -a

No hacking anything in the update scripts just finishing the steps you need to continue.

mystica555 avatar May 25 '25 02:05 mystica555

I wonder if this issue needs to be addressed by openzfs or ubuntu.

Theoretically both.

Open ZFS could perhaps have fallback code in userland that would notice and properly talk to older kernel modules.

Ubuntu could also figure out that having a kmod/userland version mismatch in the middle of an upgrade process that will then call the ZFS userland tools is possibly not the best idea and figure out how not to install it until the very end perhaps...

mystica555 avatar May 25 '25 02:05 mystica555

Re: [@bentolor] https://github.com/openzfs/zfs/issues/17337#issuecomment-2893283104

Ok, based on the initial work from @astralblue, I've created a really ugly hack solution which allowed me to upgrade without errors on deadlocks. Sharing below, so that others can successfully upgrade their Ubuntu 24.10 installation, in case they use root on ZFS and they are still on 24.10 or were able to restore their 24.10 installation after a failed upgrade attempt (cc @Exterior466):

Please reference my recent comment https://github.com/openzfs/zfs/issues/17337#issuecomment-2907563051 on how to, without modifying any update scripting like your hack here does, simply create the initramfs and modify grub.cfg to boot into the 6.14 kernel with the 6.14 initrd you just manually created.

It's a far simpler solution that should take less time, require less actual typing, and should be accessible to anyone.

mystica555 avatar May 25 '25 02:05 mystica555

So many ...different... ways listed to modify your stuff to make it work...

Sounds like a complaint. Glad we got now another one, @mystica555 ;-)

You simply […] And then you let dpkg finish the install configuration step where it left off.

The task never was to just finish dpkg.

It was always about completing the full do-release-upgrade upgrade process with all the steps that follow after dpkg --configure -a.

bentolor avatar May 25 '25 04:05 bentolor

It's more of an exasperation when I try to find the simplest way to finish something.

What steps of do-release-upgrade follow dpkg --configure of all the new files?

As far as I can tell my system successfully upgraded with only the steps I listed in my reply.

+++ Seems that I missed apt autoremove as part of do-release-upgrade.

I'm still not sure what else would have been missed otherwise.

mystica555 avatar May 25 '25 05:05 mystica555

I did the upgrade using @bentolor's solution and it worked flawlessly! Thanks for the time u took to make this into a simple solution! I would strongly recommend it if you are not stuck with i failed update and no snapshot! Rollback with zfs is VERY good stuff to have and use!

Exterior466 avatar May 25 '25 17:05 Exterior466

Thanks @bentolor for streamlining my bandaid patch to a zero-touch upgrade script! 😺

Not sure what the kernel ABI guarantee is in Linux and OpenZFS, but seeing how Ubuntu upgrades the userland and kernel in one go and the upgraded userland runs on old kernel until the final reboot, I wish the ZFS userland were backward compatible with older kernels. (In FreeBSD it was the opposite – newer kernels are kept compatible with older userland by exposing both old/new ABIs – so one would first upgrade the kernel, reboot, then upgrade the userland.)

astralblue avatar May 25 '25 17:05 astralblue