zpool cache not respected by opnsense
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [x] I have read the contributing guide lines at https://github.com/opnsense/src/blob/master/CONTRIBUTING.md
- [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/src/issues?q=is%3Aissue
Describe the bug
zpools other than zroot present in zpool cache (/etc/zfs/zpool.cache or /boot/zfs/zpool.cache) are not imported at boot time in OPNsense 24.1.9.
To Reproduce
Steps to reproduce the behavior:
- Access command line of device and get a root shell.
- Create a new zpool.
- Export the zpool (zpool export <NAME>)
- Import the zpool (zpool import <NAME>)
- Verify the zpool exists in zpool cache (zdb -U /etc/zfs/zpool.cache)
- Reboot the system.
- When the system is back up, access the command line and check if the new zpool was imported (zpool list). Notice only zroot exists.
Expected behavior
Upon reboot, the newly created zpool was imported.
Describe alternatives you considered
It is possible to manually import the zpool using zpool import after the reboot, but this doesn't really scale.
I was able to get the pool to import as expected by adding zfs_enable="YES" to /etc/rc.conf which allowed /etc/rc.d/zfs to start at boot per expectations. I notice opnsense sets this variable in /usr/local/etc/rc.loader.d/20-zfs, perhaps this variable isn't making it to freebsd's rc.d?
Screenshots
If applicable, add screenshots to help explain your problem.
Relevant log files
If applicable, information from log files supporting your claim.
Additional context
Add any other context about the problem here.
Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 24.1.9
The code you may want to improve is this:
https://github.com/opnsense/core/blob/0f73da02ad205aba1be2d7928c72fb8805b34eb4/src/etc/rc#L153-L160
Do not confuse _load with _enable vars. We don’t use the RC subsystem that much as it tends to interfere with the boot sequence.
Adding -c CACHE_FILE to the referenced zpool import and removing changes to /etc/rc.conf fixes my issue. However, I'm fairly certain that adding additional pools worked in a previous release of OPNSense with no changes.
The behavior of zpool import -a changed when OpenZFS was adopted in FreeBSD 13. In prior releases, -a would default to searching for pools in /dev:
https://github.com/freebsd/freebsd-src/blob/b5ad6b488d9e62d820fe90fdce4aee4f4d3d7162/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c#L2634
This changes in OpenZFS, used in 13+, the environment variable ZPOOL_IMPORT_PATH is used as the search path: https://github.com/freebsd/freebsd-src/blob/5fe9c9de03ef3191d216964bc4d8e427d5ed5720/sys/contrib/openzfs/cmd/zpool/zpool_main.c#L3492
Thanks for looking into this. Feel free to provide a PR, but I won't be able to review and commit in the next two weeks.
(I'd say that ZPOOL_IMPORT_PATH might be the better path forward as I've avoided -c in the past for portability.)
I'll open a PR to fix this this evening. I can work around this issue for now, so no rush on the review.
I will propose a change that imports pools from zpool cache at boot (just like vanilla FreeBSD) rather than allowing the implementation of zpool import to dictate what's imported.
Pros of using zpool cache
- It's explicit, fstab like behavior.
- Consistency across reboots, the system boots with the same pools it had at shutdown.
- Prevents importing potentially problematic pools that are connected to the system, but have never been successfully imported.
- On appliances, may help deter/prevent tampering with root fs.
Cons of using zpool cache
- The existence and purpose of zpool cache does not appear to be widely known or extensively documented. This may surprise some people.
- Boot behavior is changing. The end user may see a change in available pools after booting this release (easily resolved by importing missing pools and exporting those that shouldn't be there).
- The cache file should be present in every existing install. If, for any reason, the cache file is missing, the system will not boot properly.
- The path of the cachefile could change in future FreeBSD/OpenZFS releases (which would require a fix to rc).
Experiment is 701dff45b2f, likely in 24.7.2.
Adding here for reference - this did not go particularly well for some users (as in - unbootable box due to kernel panic) and I'm wondering in which other corner cases this will rear its ugly head.
https://forum.opnsense.org/index.php?topic=42373.msg209269#msg209269 https://forum.opnsense.org/index.php?topic=42387.msg209391#msg209391
I'm not sure about the second one. I'm happy to debug this. Though the peripheral nature of leading into panics is a bit astonishing. I still don't know why zpool-import -a doesn't consider looking for devices to be part of "-a". That seems like a harmful oversight.
Well, guess I'm out of this race. In hunt for the most vintage HW in the closet, I found this Fujitsu server but that one has ATI ES1000 32MB RAM PCI on board and no AGP slot. 😢 😭
And yeah, I'm also absolutely puzzled as for what does zpool import have to do with AGP graphics.
It’s a working theory. I don’t mind reverting if necessary but we also need to reproduce this remotely without fatalities. In theory we can install the 24.7.1 debug kernel in a working 24.7.1 and then do a tainted zpool import to get a vmcore….
Meh, OK - the AGP theory is apparently confirmed. Even set hint.agp.0.disabled=1 in loader prompt got some people back into bootable system - 1, 2
Good idea with the device hints helping people to overcome this manually.