src icon indicating copy to clipboard operation
src copied to clipboard

OPNsense fails to boot on Gen 2 Hyper-V VM after upgrade

Open Go0nNow opened this issue 2 months ago • 3 comments

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

  • [x] I have read the contributing guide lines at https://github.com/opnsense/src/blob/master/CONTRIBUTING.md
  • [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/src/issues?q=is%3Aissue

I have encountered this bug periodically for years, and unfortunately I can't pinpoint a specific version when it began occurring, but it was before OPNsense 25.5.

Steps to reproduce the behavior:

  1. Install OPNsense on a generation 2 Hyper-V VM.
  2. Update to latest OPNsense release.
  3. Reboot.
  4. When rebooting, the VM will intermittently fail to boot.

Expected behavior

Normal system boot up without freezing during startup.

Screenshot of console output during boot. Bootup halts at this point and requires reset. Image

Troubleshooting notes:

  • Amount of RAM or CPU cores assigned does not seem to matter. Tested with 1 core, 512 MB RAM as well as 4 cores, 4096 MB RAM
  • Reverting to a previous ZFS snapshot before upgrade typically works around the issue, but booting from kernel.old does not make a difference. In other words, the issue seems to be tied to a file or configuration setting that is contained within the filesystem.
  • Upgrading the snapshot has caused the issue to reoccur, in my experience.
  • Once the VM begins exhibiting this behavior, the only definitive way to fix it permanently in my experience has been to reinstall OPNsense from installation media. Oddly, when the reinstalled system is upgraded for the first time, the issue does not seem to reoccur.
  • I believe it is this FreeBSD bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264267
  • I also found this similar bug, but in my own troubleshooting I was not able to find a correlation between the console or comconsole variables and the boot behavior: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268736

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 25.7.7-amd64 Intel Xeon CPU E5-2667 v2 Hyper-V guest VM on Windows Server 2022 PHY NICs: Mellanox ConnectX-3 Pro

Go0nNow avatar Nov 06 '25 20:11 Go0nNow

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264267

The commits in this issue are in our 24.7, 25.1 and 25.7 versions.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268736

This has no associated commits.

This may be a side effect of fe1d1b42da906702b0, but this issue being intermittent makes me think there's something else going on. Theoretically there could be a fix for it in 15.0 which I cannot test easily, but it would also mean 14.x suffers from the same problem still.

Unsure how to proceed. What are your thoughts?

Cheers, Franco

fichtner avatar Nov 07 '25 09:11 fichtner

This may be relevant to the topic:

https://forum.netgate.com/topic/198006/can-no-longer-boot-with-monitor-connected-efi-frame-buffer/11?_=1762507541860

fichtner avatar Nov 07 '25 09:11 fichtner

This may be relevant to the topic:

https://forum.netgate.com/topic/198006/can-no-longer-boot-with-monitor-connected-efi-frame-buffer/11?_=1762507541860

This is very possibly the culprit. I have a pair of lab VMs I can test with. One is not exhibiting the behavior, though I've had to rebuild it after previous upgrades to repair the same problem. I'll see if I can reproduce the issue on the other VM and try this tonight or over the weekend.

Go0nNow avatar Nov 07 '25 17:11 Go0nNow