Kernel Error Observed When Running Openvpn with DCO
Issue
When running openvpn with dco enabled we observe the following errors within the dmesg output of the machine. The machine seems to be come unresponsive and requires a hard-reboot. Might be related to #13
Kindly let me know if you need more details.
[376190.816380] list_del corruption. next->prev should be ffff9c054b52b460, but was dead000000000122. (next=ffff9c0ca2831c60)
[376190.816730] ------------[ cut here ]------------
[376190.817009] kernel BUG at lib/list_debug.c:62!
[376190.817297] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[376190.817593] CPU: 10 PID: 790942 Comm: kworker/10:1 Tainted: G OE 6.1.0-30-amd64 #1 Debian 6.1.124-1
[376190.817905] Hardware name: Supermicro SYS-6029TR-HTR/X11DPT-L, BIOS 3.4 10/30/2020
[376190.818238] Workqueue: ovpn-event-wq-ovpn-7325 ovpn_peer_delete_work [ovpn_dco_v2]
[376190.818625] RIP: 0010:__list_del_entry_valid.cold+0x23/0x6f
[376190.819043] Code: e8 19 9f fe ff 0f 0b 48 89 fe 48 c7 c7 18 5d 9a b7 e8 08 9f fe ff 0f 0b 48 89 d1 48 c7 c7 38 5e 9a b7 48 89 c2 e8 f4 9e fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 e8 5d 9a b7 e8 e0 9e fe ff 0f 0b
[376190.819953] RSP: 0018:ffffbd0b176fbe40 EFLAGS: 00010246
[376190.820431] RAX: 000000000000006d RBX: ffff9c054b52b000 RCX: 0000000000000000
[376190.820845] RDX: 0000000000000000 RSI: ffff9c13ffca03a0 RDI: ffff9c13ffca03a0
[376190.821264] RBP: ffff9c054b52b300 R08: 0000000000000000 R09: ffffbd0b176fbcd8
[376190.821702] R10: 0000000000000003 R11: ffff9c243feff950 R12: ffff9c054b52b300
[376190.822123] R13: 0000000000000000 R14: ffff9c0d94cc3e00 R15: ffff9c054b52b718
[376190.822580] FS: 0000000000000000(0000) GS:ffff9c13ffc80000(0000) knlGS:0000000000000000
[376190.823027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[376190.823470] CR2: 00007f5eb48de2c0 CR3: 00000010e9694003 CR4: 00000000007706e0
[376190.823943] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[376190.824425] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[376190.824911] PKRU: 55555554
[376190.825395] Call Trace:
[376190.825857] <TASK>
[376190.826358] ? __die_body.cold+0x1a/0x1f
[376190.826864] ? die+0x2a/0x50
[376190.827342] ? do_trap+0xc5/0x110
[376190.827861] ? __list_del_entry_valid.cold+0x23/0x6f
[376190.828437] ? do_error_trap+0x6a/0x90
[376190.829007] ? __list_del_entry_valid.cold+0x23/0x6f
[376190.829520] ? exc_invalid_op+0x4c/0x60
[376190.830026] ? __list_del_entry_valid.cold+0x23/0x6f
[376190.830560] ? asm_exc_invalid_op+0x16/0x20
[376190.831093] ? __list_del_entry_valid.cold+0x23/0x6f
[376190.831612] __netif_napi_del+0x6e/0x130
[376190.832189] ovpn_peer_release+0x21/0x80 [ovpn_dco_v2]
[376190.832776] ovpn_peer_delete_work+0x15/0x20 [ovpn_dco_v2]
[376190.833311] process_one_work+0x1c4/0x380
[376190.833848] worker_thread+0x4d/0x380
[376190.834358] ? rescuer_thread+0x3a0/0x3a0
[376190.834844] kthread+0xd7/0x100
[376190.835429] ? kthread_complete_and_exit+0x20/0x20
[376190.835996] ret_from_fork+0x1f/0x30
[376190.836568] </TASK>
[376190.837123] Modules linked in: jitterentropy_rng drbg ansi_cprng authenc echainiv esp4 nf_conntrack_netlink dummy ovpn_dco_v2(OE) ip6_udp_tunnel udp_tunnel tun xfrm_user xfrm_algo nft_redir nft_nat nft_limit nft_chain_nat nf_nat cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass rapl ipmi_ssif intel_cstate binfmt_misc ast drm_vram_helper mei_me irdma drm_ttm_helper ttm ice ioatdma mei sg intel_uncore pcspkr acpi_ipmi iTCO_wdt dca ib_uverbs ipmi_si intel_pmc_bxt intel_pch_thermal joydev ipmi_devintf ib_core iTCO_vendor_support drm_kms_helper watchdog i2c_algo_bit ipmi_msghandler nft_ct evdev acpi_pad acpi_power_meter hpt(OE) tcp_bbr nf_tables nf_conntrack nfnetlink nf_defrag_ipv6 nf_defrag_ipv4 8021q garp stp configfs mrp llc efi_pstore dm_mod drm fuse
[376190.837216] ip_tables x_tables autofs4 squashfs overlay isofs cdrom bonding tls ext4 crc16 mbcache jbd2 hid_generic usbhid hid raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod loop sd_mod t10_pi crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic crc64 crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 xhci_pci xhci_hcd ahci libahci libata aesni_intel i40e crypto_simd usbcore cryptd scsi_mod i2c_i801 i2c_smbus lpc_ich usb_common scsi_common wmi button
[376190.845655] ---[ end trace 0000000000000000 ]---
[376190.919475] RIP: 0010:__list_del_entry_valid.cold+0x23/0x6f
[376190.920162] Code: e8 19 9f fe ff 0f 0b 48 89 fe 48 c7 c7 18 5d 9a b7 e8 08 9f fe ff 0f 0b 48 89 d1 48 c7 c7 38 5e 9a b7 48 89 c2 e8 f4 9e fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 e8 5d 9a b7 e8 e0 9e fe ff 0f 0b
[376190.921385] RSP: 0018:ffffbd0b176fbe40 EFLAGS: 00010246
[376190.921768] RAX: 000000000000006d RBX: ffff9c054b52b000 RCX: 0000000000000000
[376190.922137] RDX: 0000000000000000 RSI: ffff9c13ffca03a0 RDI: ffff9c13ffca03a0
[376190.922540] RBP: ffff9c054b52b300 R08: 0000000000000000 R09: ffffbd0b176fbcd8
[376190.922955] R10: 0000000000000003 R11: ffff9c243feff950 R12: ffff9c054b52b300
[376190.923390] R13: 0000000000000000 R14: ffff9c0d94cc3e00 R15: ffff9c054b52b718
[376190.923758] FS: 0000000000000000(0000) GS:ffff9c13ffc80000(0000) knlGS:0000000000000000
[376190.924122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[376190.924567] CR2: 00007f5eb48de2c0 CR3: 00000010e9694003 CR4: 00000000007706e0
[376190.924946] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[376190.925359] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[376190.925777] PKRU: 55555554
Software Versions
Distro: Debian 12 openvpn 2.6.3-1+deb12u2 linux-image-6.1.0-30-amd64 6.1.124-1 openvpn-dco-dkms 0.0+git20231103-1~deb12u1
Thanks for the report @res0nance . Any clue about what may have triggered this issue?
Thanks for the report @res0nance . Any clue about what may have triggered this issue?
Currently unsure, we know this happens on UDP but thats about all we have right now.
@res0nance any chance this can still be reproduced with the latest master/tag?