community.general icon indicating copy to clipboard operation
community.general copied to clipboard

Use correct kernel flavor for zfs kernel modules on alpine

Open tomhesse opened this issue 9 months ago • 7 comments

SUMMARY

Use the correct kernel flavor from the ansible_kernel fact to ensure the correct zfs modules are installed for integration testing.

Fixes #10453

ISSUE TYPE
  • Test Pull Request
COMPONENT NAME

zpool

tomhesse avatar Jul 25 '25 17:07 tomhesse

Can you rebase to the current main (now that #10462 has been merged)? The VM image for Alpine changed from 3.21 to 3.22 (as ansible-core changed it), so with the new CI matrix the reuslt might be different.

felixfontein avatar Jul 26 '25 12:07 felixfontein

Ping @tomhesse

felixfontein avatar Aug 07 '25 19:08 felixfontein

There is a mismatch between running kernel version and modules on the Alpine VM:

root@ip-192-168-3-124:/home/alpine/ansible_collections/community/general# find /lib/modules/ | grep zfs
/lib/modules/6.12.44-0-virt/extra/zfs.ko.gz
root@ip-192-168-3-124:/home/alpine/ansible_collections/community/general# modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/6.12.38-0-virt
root@ip-192-168-3-124:/home/alpine/ansible_collections/community/general# uname -a
Linux ip-192-168-3-124 6.12.38-0-virt #1-Alpine SMP PREEMPT_DYNAMIC 2025-07-14 19:36:17 x86_64 Linux

@Akasurde @mattclay does this need a change in the Ansible testing infrastructure, or is there something we can do about it in c.g?

felixfontein avatar Sep 04 '25 05:09 felixfontein

@felixfontein My first guess is that something is upgrading the kernel modules, either explicitly or as part of installing another package. Without a reboot, that will then prevent those kernel modules from being loaded. I've encountered this issue in tests before. Depending on the exact cause, there are a few options to solve the issue:

  1. Figure out what is upgrading the modules and stop doing that.
  2. Pin the modules to the version that matches the running kernel so they're not upgraded.
  3. As a last resort, reboot the system after the upgraded modules are installed.

Take a look at the test run to see if you can figure out where the mismatched modules come from. It might help to use ansible-test shell to log in to an instance to poke around. If they're already upgraded when you log in, try with the --raw option to bypass most of the bootstrapping to see if that's the cause.

Let me know what you find. If you need help after looking into it, let me know.

mattclay avatar Sep 04 '25 06:09 mattclay

@mattclay the problem is that apk does not allow you to install older versions of packages. Trying to install the right version of zfs-virt (6.12.38-r0) causes it to install 6.12.45-r0 instead (and upgrade all modules), since that's the only version available in the repositories (https://dl-cdn.alpinelinux.org/alpine/v3.22/main/x86_64/). So likely the only way to proceed here is to reboot the VM. I've tried using the ansible.builtin.reboot module for that, but that doesn't work since Running ansible.builtin.reboot with local connection would reboot the control node....

I guess the only way to fix this is to change the VM bootstrap to upgrade to the latest package versions, resp. to update the VM image every time the kernel version changes so it always comes with the latest kernel. Which both is probably problematic for other reasons...

felixfontein avatar Sep 06 '25 07:09 felixfontein

@felixfontein Out of curiosity, about how long does it take for the upgrade and reboot to complete?

mattclay avatar Oct 30 '25 01:10 mattclay

@mattclay I haven't found time to try this out yet.

felixfontein avatar Oct 30 '25 05:10 felixfontein