zfs icon indicating copy to clipboard operation
zfs copied to clipboard

fix: block incompatible kernel from being installed

Open tleydxdy opened this issue 4 months ago • 9 comments

Motivation and Context

The current "Requires" lines only ensure the old kernel is available on the system but it does not prevent fedora from updating to an incompatible and breaking user's system.

Description

Set Conflicts to block incompatible kernels from being installed.

How Has This Been Tested?

Waiting for build test

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [ ] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

tleydxdy avatar Apr 28 '24 03:04 tleydxdy

From my recollection of the last time this was discussed, since the kernel packages are considered special/required, this just results in dnf resolving the conflict by uninstalling the ZFS packages.

rincebrain avatar Apr 28 '24 03:04 rincebrain

Interesting, I must have missed that discussion. I tried to search and read-up on this issue :( I remember there was a guide or a issue post that mention user shoud pin zfs by echo 'zfs' > /etc/dnf/protected.d/zfs.conf

tleydxdy avatar Apr 28 '24 04:04 tleydxdy

Is https://github.com/openzfs/zfs/issues/15188 what you are referring to?

I think it is useful to spell-out what is the expected way to consume zfs-dkms on ""rolling"" distros where kernel upgrades to incompatible versions on a routine bases. Below is what I gathered reading various issues.

The current situation:

  • User with auto-upgrade: system breaks every couple month
    • Workaround: manually pin kernel to the current version, check github regularly and re-pin once zfs updates support
    • This means part of the benefit of auto-upgrade is negated, and if user didn't act quick enough fedora might have upgraded to yet another incompatible kernel and user will be stuck with kernel that is 2+ cycles older
  • User with manual upgrade: every upgrade requires manual intervention (at minimum check that kernel is not in the upgrade list)
    • Workaround: same as auto-upgrade
    • Also same as auto-upgrade, if upgrade interval was unlucky user could be stuck with a really old kernel

The ideal in my opinion:

  • User with auto-upgrade: system keeps working
    • This means kernel is kept back only as old as needed, and once zfs is updated kernel gets updated as well
  • User with manual upgrade: user can upgrade without fear
    • If user upgrades rarely then it could be stuck with an kernel older than necessary, but this seems unsolvable without using an archive of old kernel packages

My understanding is with conflict and setting zfs as protected, it achieves this goal with the caveat that dnf will print a warning when upgrading. is that correct?

tleydxdy avatar Apr 28 '24 05:04 tleydxdy

My (personal, not any kind of larger group or org) general advice is not to use it on a rolling distro that bumps kernel revs, really, precisely because it's going to break like this.

I don't actually know if that was the discussion, it's come up a few times, but my understanding is that there's not really a good way to express what's wanted in dnf settings.

I guess you might be able to do something weird like a postscript or conf file that adds a hold or analogue on the kernel range that's acceptable, maybe?

rincebrain avatar Apr 28 '24 06:04 rincebrain

I'm fairly sure protected is the way to handle it, here's from dnf docs:

protected_packages List of packages that DNF should never completely remove. They are protected via Obsoletes as well as user/plugin removals.

The default is: dnf, glob:/etc/yum/protected.d/.conf and glob:/etc/dnf/protected.d/.conf. So any packages which should be protected can do so by including a file in /etc/dnf/protected.d with their package name in it.

DNF will protect also the package corresponding to the running version of the kernel. See also protect_running_kernel option.

I do agree with the points made in the other issue thread though. I don't think setting zfs protected should be done automatically by the package install script if only for the reason that it blocks user uninstall also. So I think a sentence should be added to the fedora install guide to tell user why they need to set this.

tleydxdy avatar Apr 28 '24 14:04 tleydxdy

I don't really agree with setting it either, though, is the thing, I think.

That's just going to provoke lots of angry messages about how it broke their upgrade process entirely, and half of them followed by they undid it and it broke, IMO. That's not really a better experience, because it's more frustration, and still broken at the end with what they're likely to do without looking up what they should do.

Much as I like to encourage people to read documentation, the reality is that people aren't going to go check it if something breaks, they're going to do what seems to make sense to them to get past this complaint, and that's going to break them the same way as before, but with even higher frustration b/c they've spent more time on it.

rincebrain avatar Apr 28 '24 19:04 rincebrain

I'm not sure that is the case? Currently the situation is broken for both people that follow instructions and people that do not, and with this change it makes it so people that follow the instructions can have a better experience and people that don't is not worse off anyway. right?

tleydxdy avatar Apr 29 '24 00:04 tleydxdy

My argument is that people who follow instructions, it will still be broken the same way, because the ideal outcome is that it upgrades everything else possible but stops the kernel from upgrading past the allowed version, right?

If you mark the ZFS packages as protected, if I understand the behavior right, it'll refuse to upgrade if there would be a conflict that would require removing one of (kernel|zfs), and if they unprotect it, it'll break just like if they had never done that dance, yes? (To say nothing of if you unprotect the kernel packages, which I'm not sure it'll even allow you to do...)

And if you don't, it'll break the same way it currently does.

So my claim is based on the idea that "break all upgrades" is not a net gain, because people are just going to end up at the same end state, plus some frustration to undo the thing that stopped them doing the upgrade in the first place, and there's not really any outcome they can have from that state other than "wait for a newer version to make that not a problem" that's better.

If I'm misunderstanding something here, or you disagree with my conclusions, I'd love to hear how I'm wrong or not viewing this the same way - I don't have a particularly strong opinion on this, just that if I understand the situation right, this would just break the "rolling" in "rolling upgrades" entirely if the situation comes up, which I claim is just going to result in more frustration and people still breaking their systems, absent a better solution.

rincebrain avatar Apr 29 '24 01:04 rincebrain

Do you know how I can download the build artifact from buildbot? It might be valuable to have some practical examples.

tleydxdy avatar Apr 29 '24 01:04 tleydxdy