src
src copied to clipboard
24.7 installer image kernel panics with Chelsio T320 installed
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [X] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Describe the bug
Immediately after upgrading from OPNsense 24.1.10_8 to 24.7, the system fails to boot during initialization of the first port (cxgb0) of the Chelsio T320 network card. The same error occurs when booting from a USB memory stick with OPNsense-24.7-vga-amd64.img
installed.
The FreeBSD 14.1 installer FreeBSD-14.1-RELEASE-amd64-memstick.img
boots successfully and cxgb0 and cxgb1 interfaces are visible.
To Reproduce
Steps to reproduce the behaviour Fresh Install:
- Install a Chelsio T320 card
- Insert a USB memory stick with
OPNsense-24.7-vga-amd64.img
installed - Boot to the memory stick
- Observe messages start scrolling on the console rapidly, shortly after "Starting device manager..."
- Observe that the system eventually reboots
Steps to reproduce the behavior Upgrade:
- Install a Chelsio T320 card
- Install OPNsense 24.1 and upgrade to 24.1.10_8
- Configure interfaces on the two ports of the card (in my case LAN and WAN)
- Append
dumpdev=AUTO
to/etc/rc.conf
if not already configured - Create a boot environment using
bectl create 24.1.10
- Upgrade to 24.7 using "12) Update from console" from the root login menu
- Observe the upgrade completes successfully and the system reboots
- Observe messages start scrolling on the console rapidly, shortly after "Starting device manager..."
- Observe that the system eventually reboots
- At the Boot Menu, choose "Boot Environments" and switch to environment "24.1.10"
- Return to the main boot menu, and select "Boot Multi-user"
- Log in as root, and look in /var/crash
- Observe one or more files textdump.tar.X and info.X files with recent timestamps (in my case 240 of them because my first reboot was unattended and the reboot cycle continued for quite a while before I started to troubleshoot)
Expected behavior
OPNSense 24.7 to operate correctly with a Chelsio T320 card used for WAN and LAN connections, and not continuously kernel panic.
Describe alternatives you considered
Another firewall with no Chelsio cards has been running 24.7 for several weeks with a similar configuration. A third firewall with a matching Chelsio T320 has been running 24.1 (currently 24.1.10_8) since it was released, without incident.
Relevant log files
Excerpt from ddb.txt
in textdump.tar.last
db:0:kdb.enter.default> bt
Tracing pid 390 tid 100295 td 0xfffff800154f4000
kdb_enter() at kdb_enter+0x33/frame 0xfffffe00aab184c0
panic() at panic+0x43/frame 0xfffffe00aab18520
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00aab18580
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00aab185d0
calltrap() at calltrap+0x8/frame 0xfffffe00aab185d0
--- trap 0xc, rip = 0, rsp = 0xfffffe00aab186a8, rbp = 0xfffffe00aab186d0 ---
??() at 0/frame 0xfffffe00aab186d0
dump_iface() at dump_iface+0x145/frame 0xfffffe00aab18780
rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe00aab18800
if_attach_internal() at if_attach_internal+0x3df/frame 0xfffffe00aab18850
ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe00aab18890
cxgb_port_attach() at cxgb_port_attach+0x1d3/frame 0xfffffe00aab188d0
device_attach() at device_attach+0x3ac/frame 0xfffffe00aab18920
bus_generic_attach() at bus_generic_attach+0x4b/frame 0xfffffe00aab18950
cxgb_controller_attach() at cxgb_controller_attach+0x97f/frame 0xfffffe00aab18a10
device_attach() at device_attach+0x3ac/frame 0xfffffe00aab18a60
device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe00aab18a90
pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe00aab18ad0
devclass_driver_added() at devclass_driver_added+0x29/frame 0xfffffe00aab18b00
device_do_deferred_actions() at device_do_deferred_actions+0x3b/frame 0xfffffe00aab18b20
devctl2_ioctl() at devctl2_ioctl+0x20f/frame 0xfffffe00aab18bf0
devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfffffe00aab18c40
vn_ioctl() at vn_ioctl+0xce/frame 0xfffffe00aab18cb0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00aab18cd0
kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00aab18d40
sys_ioctl() at sys_ioctl+0xff/frame 0xfffffe00aab18e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfffffe00aab18f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00aab18f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x3549e30c55fa, rsp = 0x3549e27017e8, rbp = 0x3549e27018a0 ---
ACPI: \_SB.WMIB.WQZZ: 1 arguments were passed to a non-method ACPI object (Buffer) (20221020/nsarguments-361)
acpi_wmi1: <ACPI-WMI mapping> on acpi0
acpi_wmi1: Embedded MOF found
ACPI: \_SB.WMIV.WQZZ: 1 arguments were passed to a non-method ACPI object (Buffer) (20221020/nsarguments-361)
acpi_wmi2: <ACPI-WMI mapping> on acpi0
acpi_wmi2: Embedded MOF found
cxgbc0: <Chelsio T320, 2 ports> mem 0xd1000000-0xd1000fff,0xd1001000-0xd1001fff irq 16 at device 0.0 on pci1
cxgbc0: using MSI-X interrupts (9 vectors)
cxgb0: <Port 0 10GBASE-R> on cxgbc0
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x0
fault code = supervisor read instruction, page not present
instruction pointer = 0x20:0x0
stack pointer = 0x28:0xfffffe00aab186a8
frame pointer = 0x28:0xfffffe00aab186d0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 390 (devctl)
rdi: fffff80006bb2800 rsi: fffffe00aab18720 rdx: fffffe0091153ed8
rcx: 00000000c0306938 r8: 0000000000000000 r9: 0000000000000000
rax: 0000000000000000 rbx: fffffe00aab18720 rbp: fffffe00aab186d0
r10: fffff8006686a000 r11: 0000000001b0416b r12: 0000000000008802
r13: fffff8006686a010 r14: fffffe0091153ed8 r15: 0000000000000000
trap number = 12
panic: page fault
cpuid = 1
time = 1729622293
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00aab18390
vpanic() at vpanic+0x131/frame 0xfffffe00aab184c0
panic() at panic+0x43/frame 0xfffffe00aab18520
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00aab18580
trap_pfault() at trap_pfault+0x46/frame 0xfffffe00aab185d0
calltrap() at calltrap+0x8/frame 0xfffffe00aab185d0
--- trap 0xc, rip = 0, rsp = 0xfffffe00aab186a8, rbp = 0xfffffe00aab186d0 ---
??() at 0/frame 0xfffffe00aab186d0
dump_iface() at dump_iface+0x145/frame 0xfffffe00aab18780
Additional context
Happy to provide more information and run further tests if needed. This system is not yet carrying traffic.
This looks similar to this forum thread, but there isn't enough information there to be sure.
I did not test traffic over the NIC when running FreeBSD 14.1, but can do so if that's necessary.
Environment
OPNsense 24.7 (amd64). Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Chelsio T320 with 2x 10GBase-SR SFP+