PF_RING icon indicating copy to clipboard operation
PF_RING copied to clipboard

PF_RING Zero Copy Intel igb Driver Crash (Rocky Linux 9.5)

Open fandigunawan opened this issue 6 months ago • 17 comments

Hi, I am trying to install PF_RING Zero Copy on our Intel igb but when the computer boot for few seconds it is suddenly crash and reboot. Here is the full information:

  • OS: Rocky Linux 9.5
  • Kernel version: 5.14.0-503.40.1.el9_5.x86_64
  • Machine: Dell PowerEdge R660
  • PF_RING version: pfring-9.0.0-9896
  • Software that use PF_RING: Suricata

Driver versions:

  • e1000e-zc-3.8.7.9896-dkms.noarch.rpm
  • i40e-zc-2.24.6.9896-dkms.noarch.rpm
  • iavf-zc-4.9.5.9896-dkms.noarch.rpm
  • ice-zc-1.12.7.9896-dkms.noarch.rpm
  • igb-zc-5.14.16.9896-dkms.noarch.rpm
  • ixgbe-zc-5.19.6.9896-dkms.noarch.rpm
  • ixgbevf-zc-4.18.9.9896-dkms.noarch.rpm
  • ntopng-data-6.4.250515-25785.noarch.rpm
  • pfring-dkms-9.0.0.9896-dkms.noarch.rpm
  • pfring-drivers-zc-dkms-9.0.0-9896.noarch.rpm

Main error trace:


[   46.168771] igb 0000:16:00.0 ens1f0: PCIe link lost
[   46.263302] pcieport 0000:15:01.0: AER: Root Port link has been reset (0)
[   46.263559] igb 0000:16:00.0: enabling device (0000 -> 0002)
[   46.263838] igb 0000:16:00.1: enabling device (0000 -> 0002)
[   46.264097] igb 0000:16:00.2: enabling device (0000 -> 0002)
[   46.264355] igb 0000:16:00.3: enabling device (0000 -> 0002)
[   46.349430] [PF_RING] Registering ZC device ens1f0@0 [rx-ring=00000000f46bd8dc][tx-ring=0000000083715ebb]
[   46.349623] [PF_RING] Registering ZC device ens1f0@1 [rx-ring=00000000de739896][tx-ring=00000000816c4074]
[   46.349764] [PF_RING] Registering ZC device ens1f0@2 [rx-ring=000000001a77f489][tx-ring=00000000249120d8]
[   46.349901] [PF_RING] Registering ZC device ens1f0@3 [rx-ring=000000002dfa07ed][tx-ring=000000005a2c76be]
[   46.350035] [PF_RING] Registering ZC device ens1f0@4 [rx-ring=00000000b703d9af][tx-ring=000000005dfac6c1]
[   46.350164] [PF_RING] Registering ZC device ens1f0@5 [rx-ring=00000000e34aadca][tx-ring=00000000061e1fda]
[   46.350289] [PF_RING] Registering ZC device ens1f0@6 [rx-ring=000000008fb077f2][tx-ring=00000000436445ed]
[   46.350418] [PF_RING] Registering ZC device ens1f0@7 [rx-ring=000000003d77cdb0][tx-ring=00000000d44ca7ba]
[   46.386184] list_del corruption. prev->next should be ff42815208eb38b8, but was ff3830f0c8f64458
[   46.386387] ------------[ cut here ]------------
[   46.386576] kernel BUG at lib/list_debug.c:51!
[   46.386745] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[   46.386748] CPU: 29 PID: 2602 Comm: W#01-zc:e..f0@0 Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-503.40.1.el9_5.x86_64 #1
[   46.386749] Hardware name: Dell Inc. PowerEdge R660/XXXXXX, BIOS 2.4.4 09/27/2024
[   46.386750] RIP: 0010:__list_del_entry_valid.cold+0x31/0x47
[   46.386755] Code: c0 c7 a0 e8 35 77 fe ff 0f 0b 48 c7 c7 b0 c0 c7 a0 e8 27 77 fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 70 c0 c7 a0 e8 13 77 fe ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 38 c0 c7 a0 e8 ff 76 fe ff 0f 0b
[   46.386757] RSP: 0018:ff42815208eb36e8 EFLAGS: 00010046
[   46.386758] RAX: 0000000000000054 RBX: ff42815208eb38a0 RCX: 0000000000000000
[   46.386759] RDX: 0000000000000000 RSI: ff38310fffba08c0 RDI: ff38310fffba08c0
[   46.386760] RBP: ff3830f0c8f64450 R08: 0000000000000000 R09: ff42815208eb35a8
[   46.386761] R10: ff42815208eb35a0 R11: ffffffffa17e93e8 R12: 0000000000000286
[   46.386761] R13: ff42815208eb3890 R14: 0000000000000001 R15: ff42815208eb3760
[   46.386762] FS:  00007f05677ff640(0000) GS:ff38310fffb80000(0000) knlGS:0000000000000000
[   46.386763] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.386764] CR2: 00007f058c5a8008 CR3: 000000018227c004 CR4: 0000000000771ef0
[   46.386765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   46.386765] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   46.386766] PKRU: 55555554
[   46.386766] Call Trace:
[   46.386767]  <TASK>
[   46.386768]  ? show_trace_log_lvl+0x1c4/0x2df
[   46.386773]  ? show_trace_log_lvl+0x1c4/0x2df
[   46.386775]  ? remove_wait_queue+0x20/0x60
[   46.386778]  ? __die_body.cold+0x8/0xd
[   46.386780]  ? die+0x2b/0x50
[   46.386782]  ? do_trap+0xce/0x120
[   46.386784]  ? __list_del_entry_valid.cold+0x31/0x47
[   46.386786]  ? do_error_trap+0x65/0x80
[   46.386788]  ? __list_del_entry_valid.cold+0x31/0x47
[   46.386789]  ? exc_invalid_op+0x4e/0x70
[   46.386793]  ? __list_del_entry_valid.cold+0x31/0x47
[   46.386794]  ? asm_exc_invalid_op+0x16/0x20
[   46.386799]  ? __list_del_entry_valid.cold+0x31/0x47
[   46.386800]  ? __list_del_entry_valid.cold+0x31/0x47
[   46.386802]  remove_wait_queue+0x20/0x60
[   46.386804]  poll_freewait+0x3d/0xa0
[   46.386808]  do_sys_poll+0x176/0x230
[   46.386810]  ? remove_wait_queue+0x20/0x60
[   46.386812]  ? poll_freewait+0x45/0xa0
[   46.386813]  ? do_sys_poll+0x176/0x230
[   46.386815]  ? __pfx_pollwake+0x10/0x10
[   46.386817]  ? pick_next_task_idle+0x26/0x40
[   46.386819]  ? pick_next_task+0x9f9/0xaf0
[   46.386821]  ? dequeue_task_fair+0xaa/0x370
[   46.386824]  ? __switch_to_asm+0x3a/0x80
[   46.386826]  ? finish_task_switch.isra.0+0x8c/0x2a0
[   46.386828]  ? __pfx_pollwake+0x10/0x10
[   46.386830]  ? schedule+0x2e/0xd0
[   46.386833]  ? schedule_hrtimeout_range_clock+0x9d/0x120
[   46.386836]  ? wait_packet_function_ptr+0x64/0xc0 [igb_zc]
[   46.386852]  ? ring_poll+0x61/0x280 [pf_ring]
[   46.386864]  ? sock_poll+0x4c/0xe0
[   46.386867]  ? do_poll.constprop.0+0x298/0x380
[   46.386868]  ? __seccomp_filter+0x45/0x480
[   46.386871]  ? ktime_get_ts64+0x49/0xf0
[   46.386873]  __x64_sys_poll+0xa6/0x140
[   46.386876]  do_syscall_64+0x5c/0xf0
[   46.386877]  ? __audit_filter_op+0xa5/0xf0
[   46.386879]  ? fpregs_restore_userregs+0x47/0xd0
[   46.386881]  ? exit_to_user_mode_prepare+0xef/0x100
[   46.386883]  ? syscall_exit_to_user_mode+0x19/0x40
[   46.386884]  ? do_syscall_64+0x6b/0xf0
[   46.386886]  ? __pfx_pollwake+0x10/0x10
[   46.386887]  ? do_syscall_64+0x6b/0xf0
[   46.386888]  ? schedule+0x2e/0xd0
[   46.386890]  ? rseq_get_rseq_cs+0x1d/0x240
[   46.386892]  ? rseq_ip_fixup+0x6e/0x1a0
[   46.386893]  ? rseq_get_rseq_cs+0x1d/0x240
[   46.386894]  ? rseq_ip_fixup+0x6e/0x1a0
[   46.386895]  ? rseq_get_rseq_cs+0x1d/0x240
[   46.386896]  ? rseq_ip_fixup+0x6e/0x1a0
[   46.386897]  ? fpregs_restore_userregs+0x47/0xd0
[   46.386899]  ? exit_to_user_mode_prepare+0xef/0x100
[   46.386900]  ? syscall_exit_to_user_mode+0x19/0x40
[   46.386900]  ? do_syscall_64+0x6b/0xf0
[   46.386901]  ? do_syscall_64+0x6b/0xf0
[   46.386902]  ? do_syscall_64+0x6b/0xf0
[   46.386903]  ? do_syscall_64+0x6b/0xf0
[   46.386904]  ? do_syscall_64+0x6b/0xf0
[   46.386905]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   46.386906] RIP: 0033:0x7f05e1d015df
[   46.386932] Code: 54 24 1c 48 89 74 24 10 48 89 7c 24 08 e8 69 4b f8 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 48 8b 7c 24 08 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 08 e8 bd 4b f8 ff 8b 44
[   46.386933] RSP: 002b:00007f05677fceb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
[   46.386934] RAX: ffffffffffffffda RBX: 00007f056028cbf0 RCX: 00007f05e1d015df
[   46.386935] RDX: 000000000000003d RSI: 0000000000000001 RDI: 00007f05677fcee8
[   46.386935] RBP: 00007f056028cbf0 R08: 0000000000000000 R09: 0000000000000000
[   46.386936] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000003d
[   46.386936] R13: 0000000000000000 R14: 00007f05677fd018 R15: 00007f05677fcf30
[   46.386938]  </TASK>
[   46.386938] Modules linked in: binfmt_misc ice_zc(OE) gnss igb_zc(OE) vxlan ip6_udp_tunnel udp_tunnel uio pf_ring(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink ipmi_ssif vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i40e kvm ib_uverbs rapl dell_wmi ledtrig_audio acpi_ipmi sparse_keymap rfkill iTCO_wdt video intel_cstate ipmi_si dell_smbios pmt_telemetry mgag200 iTCO_vendor_support dcdbas ib_core pmt_class intel_sdsi dell_wmi_descriptor wmi_bmof intel_uncore mei_me ipmi_devintf pcspkr drm_shmem_helper i2c_i801 isst_if_mbox_pci isst_if_mmio drm_kms_helper mei intel_vsec isst_if_common i2c_ismt i2c_smbus ipmi_msghandler acpi_power_meter joydev fuse drm ext4 mbcache jbd2 sd_mod t10_pi sg iaa_crypto ahci
[   46.386976]  crct10dif_pclmul libahci crc32_pclmul crc32c_intel megaraid_sas idxd i2c_algo_bit tg3 libata ghash_clmulni_intel idxd_bus wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod [last unloaded: gnss]

Complete log:

pf_ring_zc_igb_crash_RL9.5.log

fandigunawan avatar May 18 '25 06:05 fandigunawan

Are you able to boot the system by blacklisting the driver? What is the exact adapter model? (as reported by lspci)

cardigliano avatar May 19 '25 16:05 cardigliano

This is the result from lspci

lspci | grep -i eth
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
16:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
16:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
16:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
16:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
2a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
2a:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
2a:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
2a:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
98:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
98:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)

Yes, if we disable the driver then it is okay. I will try with PF_RING standard mode (without ZC). I will let you know.

fandigunawan avatar May 20 '25 03:05 fandigunawan

There are two types of Intel Ethernet here:

  • igb: Intel I350
  • ice: Intel E810

Intel I350 crash using PF_RING Zero Copy

** pfcount -i zc:ens1f0
#########################################################################
# ERROR: You do not seem to have a valid PF_RING ZC 9.0.0.250428
# license for ens1f0 (MAC 20:3A:43:0A:80:F8) [Intel 1 Gbit igb 82580-based]
# ERROR: Missing license file
# ERROR: Please get one at http://shop.ntop.org/
#########################################################################
# PF_RING ZC running in demo mode (packet capture and transmission
# limited to 5 minutes)
#########################################################################
Using PF_RING v.9.0.0.250428 kernel module v.9.0.0
Dumping statistics on /proc/net/pf_ring/stats/3335-ens1f0.48
error reading link speed on ens1f0
Capturing from zc:ens1f0 [mac: 20:3A:43:0A:80:F8][if_index: 34][speed: 1000Mb/s]
# Device RX channels: 8
# Polling threads:    1**

Intel E810 does not crash using PF_RING Zero Copy

pfcount -i zc:ens2f1np1
#########################################################################
# ERROR: You do not seem to have a valid PF_RING ZC 9.0.0.250428
# license for ens2f1np1 (MAC B4:83:51:1E:25:F1) [Intel 100 Gbit ice family]
# ERROR: Missing license file
# ERROR: Please get one at http://shop.ntop.org/
#########################################################################
# PF_RING ZC running in demo mode (packet capture and transmission
# limited to 5 minutes)
#########################################################################
Using PF_RING v.9.0.0.250428 kernel module v.9.0.0
Dumping statistics on /proc/net/pf_ring/stats/3330-ens2f1np1.47
Capturing from zc:ens2f1np1 [mac: B4:83:51:1E:25:F1][if_index: 43][speed: 10000Mb/s]
# Device RX channels: 16
# Polling threads:    1
=========================
Absolute Stats: [0 pkts total][0 pkts dropped][0.0% dropped]
[0 pkts rcvd][0 bytes rcvd]
=========================

=========================
Absolute Stats: [0 pkts total][0 pkts dropped][0.0% dropped]
[0 pkts rcvd][0 bytes rcvd][0.00 pkt/sec][0.00 Mbit/sec]
=========================
Actual Stats: [0 pkts rcvd][1'000.03 ms][0.00 pps][0.00 Gbps]
=========================


=========================
Absolute Stats: [0 pkts total][0 pkts dropped][0.0% dropped]
[0 pkts rcvd][0 bytes rcvd][0.00 pkt/sec][0.00 Mbit/sec]
=========================
Actual Stats: [0 pkts rcvd][1'000.03 ms][0.00 pps][0.00 Gbps]
=========================

fandigunawan avatar May 20 '25 03:05 fandigunawan

We are checking the driver, please use the interface without zc: in the meantime. Thank you.

cardigliano avatar May 20 '25 07:05 cardigliano

We are checking the driver, please use the interface without zc: in the meantime. Thank you.

We plan to use Zero Copy in the production, please let us know. FYI, we have bought licenses in the past and will use PF_RING ZC in the near future.

fandigunawan avatar May 20 '25 09:05 fandigunawan

Does it crash as soon as you start the application (e.g. pfcount)? Or after any specific action (e.g. when you close it)?

cardigliano avatar May 20 '25 14:05 cardigliano

Does it crash as soon as you start the application (e.g. pfcount)? Or after any specific action (e.g. when you close it)?

It crashed a few seconds after application running (pfcount or suricata)

fandigunawan avatar May 20 '25 14:05 fandigunawan

Does it start after receiving some packets, or just during initialization? Sorry for all those questions, unable to reproduce so far, thus any additional info are useful digging..

cardigliano avatar May 20 '25 14:05 cardigliano

Does it start after receiving some packets, or just during initialization? Sorry for all those questions, unable to reproduce so far, thus any additional info are useful digging..

Actually, the cable is unplugged (no cable connected to the ethernet) therefore there is no data received. In E810, it is okay when the cable is unplugged.

fandigunawan avatar May 20 '25 14:05 fandigunawan

From the log it seems an hardware error is triggering when opening the device, and this is unloading the driver while the interface is in use by the application. I am digging to figure out why this error occurs. In the meantime, could you try plugging a cable, to make sure this is not related to the absence of carrier?

cardigliano avatar May 20 '25 15:05 cardigliano

Also, please try configuring 8 RSS queues rather than 16

cardigliano avatar May 20 '25 16:05 cardigliano

Also, please try configuring 8 RSS queues rather than 16

Yes, it is already 8

# pf_ringcfg --list-interfaces
Name: eno8303              Driver: tg3        RSS:     1    [Linux Driver]
Name: eno8403              Driver: tg3        RSS:     1    [Linux Driver]
Name: eno12399             Driver: igb        RSS:     8    [Running ZC]
Name: eno12409             Driver: igb        RSS:     8    [Running ZC]
Name: eno12419             Driver: igb        RSS:     8    [Running ZC]
Name: eno12429             Driver: igb        RSS:     8    [Running ZC]
Name: ens3f0               Driver: igb        RSS:     8    [Running ZC]
Name: ens3f1               Driver: igb        RSS:     8    [Running ZC]
Name: ens3f2               Driver: igb        RSS:     8    [Running ZC]
Name: ens3f3               Driver: igb        RSS:     8    [Running ZC]
Name: ens1f0np0            Driver: ice_zc     RSS:     16   [Running ZC]
Name: ens1f1np1            Driver: ice_zc     RSS:     16   [Running ZC]
Name: ens2f0np0            Driver: ice_zc     RSS:     16   [Running ZC]
Name: ens2f1np1            Driver: ice_zc     RSS:     16   [Running ZC]

fandigunawan avatar May 21 '25 09:05 fandigunawan

From the log it seems an hardware error is triggering when opening the device, and this is unloading the driver while the interface is in use by the application. I am digging to figure out why this error occurs. In the meantime, could you try plugging a cable, to make sure this is not related to the absence of carrier?

I tried to plug the cable with the same crash.

[76798.722100] igb 0000:be:00.0 ens3f0: igb: ens3f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[76822.165282] [PF_RING] Trying to map ZC device ens3f0@0
[76823.200016] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
[76823.200351] {1}[Hardware Error]: event severity: recoverable
[76823.200658] {1}[Hardware Error]:  Error 0, type: fatal
[76823.200958] {1}[Hardware Error]:   section_type: PCIe error
[76823.201251] {1}[Hardware Error]:   port_type: 0, PCIe end point
[76823.201538] {1}[Hardware Error]:   version: 3.0
[76823.201820] {1}[Hardware Error]:   command: 0x0406, status: 0x0810
[76823.202100] {1}[Hardware Error]:   device_id: 0000:be:00.0
[76823.202378] {1}[Hardware Error]:   slot: 3
[76823.202648] {1}[Hardware Error]:   secondary_bus: 0x00
[76823.202916] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x1521
[76823.203182] {1}[Hardware Error]:   class_code: 020000
[76823.203445] {1}[Hardware Error]:   aer_uncor_status: 0x00008000, aer_uncor_mask: 0x00010000
[76823.203709] {1}[Hardware Error]:   aer_uncor_severity: 0x004ef031
[76823.203968] {1}[Hardware Error]:   TLP Header: 40000004 bd0000ff e2705c80 e2705c80
[76823.204296] igb 0000:be:00.0: AER: aer_status: 0x00008000, aer_mask: 0x00010000
[76823.204481] igb 0000:be:00.0:    [15] CmpltAbrt              (First)
[76823.204660] igb 0000:be:00.0: AER: aer_layer=Transaction Layer, aer_agent=Completer ID
[76823.204839] igb 0000:be:00.0: AER: aer_uncor_severity: 0x004ef031
[76823.205016] igb 0000:be:00.0: AER:   TLP Header: 40000004 bd0000ff e2705c80 e2705c80
[76823.228230] [PF_RING] Removing ZC device ens3f0@0 [rx-ring=00000000e22551bd][tx-ring=00000000de8c6ea6]
[76823.228421] [PF_RING] Unloading ZC driver while the device is in use from userspace!!
[76823.228602] [PF_RING] Removing ZC device ens3f0@1 [rx-ring=0000000078582b65][tx-ring=00000000c67d6553]
[76823.228784] [PF_RING] Removing ZC device ens3f0@2 [rx-ring=000000001d2bd8eb][tx-ring=00000000381b1132]
[76823.228962] [PF_RING] Removing ZC device ens3f0@3 [rx-ring=000000001937dd2f][tx-ring=00000000dbcbf4dd]
[76823.229145] [PF_RING] Removing ZC device ens3f0@4 [rx-ring=00000000ebd9c969][tx-ring=0000000093dd26ec]
[76823.229318] [PF_RING] Removing ZC device ens3f0@5 [rx-ring=00000000c70e7268][tx-ring=00000000a7cc82d4]
[76823.229488] [PF_RING] Removing ZC device ens3f0@6 [rx-ring=00000000d1d68195][tx-ring=00000000bd55a3b7]
[76823.229655] [PF_RING] Removing ZC device ens3f0@7 [rx-ring=00000000b3d8e860][tx-ring=000000001a0527ab]
[76823.254882] [PF_RING] Removing ZC device ens3f1@0 [rx-ring=000000006586e919][tx-ring=00000000cfbc0c6f]
[76823.255052] [PF_RING] Removing ZC device ens3f1@1 [rx-ring=00000000aa065f54][tx-ring=0000000050ba0b97]
[76823.255252] [PF_RING] Removing ZC device ens3f1@2 [rx-ring=00000000ad133fe6][tx-ring=0000000024eca28c]
[76823.255409] [PF_RING] Removing ZC device ens3f1@3 [rx-ring=000000001b3112aa][tx-ring=000000006bc86f2a]
[76823.255564] [PF_RING] Removing ZC device ens3f1@4 [rx-ring=000000008354918a][tx-ring=0000000091223ca0]
[76823.255714] [PF_RING] Removing ZC device ens3f1@5 [rx-ring=000000007f81845a][tx-ring=0000000052b0bc47]
[76823.255864] [PF_RING] Removing ZC device ens3f1@6 [rx-ring=00000000e5d43762][tx-ring=00000000d26c3456]
[76823.256010] [PF_RING] Removing ZC device ens3f1@7 [rx-ring=00000000ae70c793][tx-ring=00000000bd3afd38]
[76823.280955] [PF_RING] Removing ZC device ens3f2@0 [rx-ring=0000000064920997][tx-ring=00000000b44da3e8]
[76823.281105] [PF_RING] Removing ZC device ens3f2@1 [rx-ring=00000000b3944479][tx-ring=00000000ce9d4cae]
[76823.281296] [PF_RING] Removing ZC device ens3f2@2 [rx-ring=00000000acb2654f][tx-ring=0000000059872b26]
[76823.281433] [PF_RING] Removing ZC device ens3f2@3 [rx-ring=0000000048ddad83][tx-ring=000000004adccace]
[76823.281566] [PF_RING] Removing ZC device ens3f2@4 [rx-ring=00000000e0af99b7][tx-ring=00000000abdd1164]
[76823.281696] [PF_RING] Removing ZC device ens3f2@5 [rx-ring=000000009844d6e9][tx-ring=00000000aad654a6]
[76823.281823] [PF_RING] Removing ZC device ens3f2@6 [rx-ring=000000006b916f16][tx-ring=00000000789b8a5a]
[76823.281947] [PF_RING] Removing ZC device ens3f2@7 [rx-ring=0000000055e60565][tx-ring=000000003791c775]
[76823.305601] [PF_RING] Removing ZC device ens3f3@0 [rx-ring=000000002447f2c7][tx-ring=000000009a46e77e]
[76823.305728] [PF_RING] Removing ZC device ens3f3@1 [rx-ring=00000000a7472ea8][tx-ring=00000000ab637a40]
[76823.305846] [PF_RING] Removing ZC device ens3f3@2 [rx-ring=0000000049f53c8a][tx-ring=00000000cad6409f]
[76823.305959] [PF_RING] Removing ZC device ens3f3@3 [rx-ring=000000004fc551f7][tx-ring=000000007ded2a4b]
[76823.306069] [PF_RING] Removing ZC device ens3f3@4 [rx-ring=00000000cef06cde][tx-ring=00000000311e5ded]
[76823.306204] [PF_RING] Removing ZC device ens3f3@5 [rx-ring=000000006f43de92][tx-ring=00000000f1b22421]
[76823.306315] [PF_RING] Removing ZC device ens3f3@6 [rx-ring=00000000ba0fac11][tx-ring=000000003618c92d]
[76823.306425] [PF_RING] Removing ZC device ens3f3@7 [rx-ring=00000000ebd8c188][tx-ring=00000000ead618ce]
[76823.315578] igb 0000:be:00.0 ens3f0: PCIe link lost
[76823.486139] pcieport 0000:bd:01.0: AER: Root Port link has been reset (0)
[76823.486281] igb 0000:be:00.0: enabling device (0000 -> 0002)
[76823.486539] igb 0000:be:00.1: enabling device (0000 -> 0002)
[76823.486781] igb 0000:be:00.2: enabling device (0000 -> 0002)
[76823.487022] igb 0000:be:00.3: enabling device (0000 -> 0002)
[76823.571992] [PF_RING] Registering ZC device ens3f0@0 [rx-ring=00000000e22551bd][tx-ring=00000000de8c6ea6]
[76823.572128] [PF_RING] Registering ZC device ens3f0@1 [rx-ring=0000000078582b65][tx-ring=00000000c67d6553]
[76823.572292] [PF_RING] Registering ZC device ens3f0@2 [rx-ring=000000001d2bd8eb][tx-ring=00000000381b1132]
[76823.572406] [PF_RING] Registering ZC device ens3f0@3 [rx-ring=000000001937dd2f][tx-ring=00000000dbcbf4dd]
[76823.572517] [PF_RING] Registering ZC device ens3f0@4 [rx-ring=00000000ebd9c969][tx-ring=0000000093dd26ec]
[76823.572626] [PF_RING] Registering ZC device ens3f0@5 [rx-ring=00000000c70e7268][tx-ring=00000000a7cc82d4]
[76823.572732] [PF_RING] Registering ZC device ens3f0@6 [rx-ring=00000000d1d68195][tx-ring=00000000bd55a3b7]
[76823.572839] [PF_RING] Registering ZC device ens3f0@7 [rx-ring=00000000b3d8e860][tx-ring=000000001a0527ab]
[76823.580051] list_del corruption. prev->next should be ff8bc81a4e3ffb98, but was ff4d5be06074cc58
[76823.580229] ------------[ cut here ]------------
[76823.580374] kernel BUG at lib/list_debug.c:51!
[76823.580517] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[76823.580520] CPU: 1 PID: 8456 Comm: pfcount Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-503.40.1.el9_5.x86_64 #1
[76823.580522] Hardware name: Dell Inc. PowerEdge R660/09YM04, BIOS 2.4.4 09/27/2024
[76823.580523] RIP: 0010:__list_del_entry_valid.cold+0x31/0x47
[76823.580529] Code: c0 a7 b8 e8 35 77 fe ff 0f 0b 48 c7 c7 b0 c0 a7 b8 e8 27 77 fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 70 c0 a7 b8 e8 13 77 fe ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 38 c0 a7 b8 e8 ff 76 fe ff 0f 0b
[76823.580531] RSP: 0018:ff8bc81a4e3ff9c8 EFLAGS: 00010046
[76823.580533] RAX: 0000000000000054 RBX: ff8bc81a4e3ffb80 RCX: 0000000000000000
[76823.580534] RDX: 0000000000000000 RSI: ff4d5befbf6208c0 RDI: ff4d5befbf6208c0
[76823.580534] RBP: ff4d5be06074cc50 R08: 0000000000000000 R09: ff8bc81a4e3ff888
[76823.580535] R10: ff8bc81a4e3ff880 R11: ffffffffb95e93e8 R12: 0000000000000282
[76823.580536] R13: ff8bc81a4e3ffb70 R14: 0000000000000001 R15: ff8bc81a4e3ffa40
[76823.580537] FS:  00007f2dda598740(0000) GS:ff4d5befbf600000(0000) knlGS:0000000000000000
[76823.580538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76823.580539] CR2: 000000c00042b000 CR3: 00000002d95b2005 CR4: 0000000000771ef0
[76823.580539] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76823.580540] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[76823.580541] PKRU: 55555554
[76823.580541] Call Trace:
[76823.580542]  <TASK>
[76823.580543]  ? show_trace_log_lvl+0x1c4/0x2df
[76823.580547]  ? show_trace_log_lvl+0x1c4/0x2df
[76823.580550]  ? remove_wait_queue+0x20/0x60
[76823.580555]  ? __die_body.cold+0x8/0xd
[76823.580557]  ? die+0x2b/0x50
[76823.580560]  ? do_trap+0xce/0x120
[76823.580563]  ? __list_del_entry_valid.cold+0x31/0x47
[76823.580565]  ? do_error_trap+0x65/0x80
[76823.580566]  ? __list_del_entry_valid.cold+0x31/0x47
[76823.580569]  ? exc_invalid_op+0x4e/0x70
[76823.580573]  ? __list_del_entry_valid.cold+0x31/0x47
[76823.580575]  ? asm_exc_invalid_op+0x16/0x20
[76823.580580]  ? __list_del_entry_valid.cold+0x31/0x47
[76823.580582]  ? __list_del_entry_valid.cold+0x31/0x47
[76823.580584]  remove_wait_queue+0x20/0x60
[76823.580586]  poll_freewait+0x3d/0xa0
[76823.580590]  do_sys_poll+0x176/0x230
[76823.580593]  ? rseq_get_rseq_cs+0x1d/0x240
[76823.580596]  ? rseq_ip_fixup+0x6e/0x1a0
[76823.580597]  ? __audit_filter_op+0xa5/0xf0
[76823.580601]  ? __rseq_handle_notify_resume+0x26/0xb0
[76823.580603]  ? __pfx_pollwake+0x10/0x10
[76823.580605]  ? __rseq_handle_notify_resume+0x26/0xb0
[76823.580606]  ? exit_to_user_mode_loop+0xd9/0x130
[76823.580609]  ? exit_to_user_mode_prepare+0xef/0x100
[76823.580610]  ? syscall_exit_to_user_mode+0x19/0x40
[76823.580612]  ? do_syscall_64+0x6b/0xf0
[76823.580613]  ? __wake_up+0x40/0x60
[76823.580615]  ? rseq_get_rseq_cs+0x1d/0x240
[76823.580616]  ? rseq_ip_fixup+0x6e/0x1a0
[76823.580617]  ? __audit_filter_op+0xa5/0xf0
[76823.580618]  ? __rseq_handle_notify_resume+0x26/0xb0
[76823.580620]  ? exit_to_user_mode_loop+0xd9/0x130
[76823.580620]  ? ktime_get_ts64+0x49/0xf0
[76823.580623]  __x64_sys_poll+0xa6/0x140
[76823.580626]  do_syscall_64+0x5c/0xf0
[76823.580627]  ? __rseq_handle_notify_resume+0x26/0xb0
[76823.580628]  ? exit_to_user_mode_loop+0xd9/0x130
[76823.580629]  ? exit_to_user_mode_prepare+0xef/0x100
[76823.580630]  ? syscall_exit_to_user_mode+0x19/0x40
[76823.580631]  ? do_syscall_64+0x6b/0xf0
[76823.580632]  ? fpregs_restore_userregs+0x47/0xd0
[76823.580635]  ? exit_to_user_mode_prepare+0xef/0x100
[76823.580636]  ? sysvec_irq_work+0x3c/0x90
[76823.580637]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[76823.580639] RIP: 0033:0x7f2dda3015a7
[76823.580666] Code: 00 00 00 5b 49 8b 45 10 5d 41 5c 41 5d 41 5e c3 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 07 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[76823.580667] RSP: 002b:00007ffea27d18d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000007
[76823.580668] RAX: ffffffffffffffda RBX: 00000000015aabb0 RCX: 00007f2dda3015a7
[76823.580669] RDX: 0000000000000051 RSI: 0000000000000001 RDI: 00007ffea27d18e8
[76823.580670] RBP: 00000000015aabb0 R08: 0000000000000001 R09: 00007ffea27d19a7
[76823.580670] R10: 00007f2dda213808 R11: 0000000000000246 R12: 0000000000000051
[76823.580671] R13: 0000000000000000 R14: 00007ffea27d19a0 R15: 00007ffea27d1930
[76823.580672]  </TASK>
[76823.580673] Modules linked in: tls binfmt_misc ice_zc(OE) gnss igb_zc(OE) vxlan ip6_udp_tunnel udp_tunnel uio pf_ring(OE) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink ipmi_ssif vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm i40e ib_uverbs dell_wmi ledtrig_audio rapl sparse_keymap acpi_ipmi rfkill iTCO_wdt iTCO_vendor_support ipmi_si intel_cstate video pmt_telemetry mei_me i2c_i801 pmt_class intel_sdsi isst_if_mbox_pci dell_smbios mgag200 ipmi_devintf isst_if_mmio dcdbas ib_core intel_uncore drm_shmem_helper dell_wmi_descriptor wmi_bmof pcspkr drm_kms_helper mei isst_if_common intel_vsec i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter fuse drm ext4 mbcache jbd2 sd_mod t10_pi sg iaa_crypto ahci
[76823.580711]  crct10dif_pclmul libahci crc32_pclmul crc32c_intel idxd i2c_algo_bit libata megaraid_sas tg3 ghash_clmulni_intel idxd_bus wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod [last unloaded: gnss]

fandigunawan avatar May 21 '25 09:05 fandigunawan

A driver update (igb-zc-5.16.11.9925-dkms.noarch.rpm) is available (see https://packages.ntop.org/centos/9/noarch/Packages/) Please try with this version which includes improvements to support latest kernels.

cardigliano avatar May 22 '25 12:05 cardigliano

A driver update (igb-zc-5.16.11.9925-dkms.noarch.rpm) is available (see https://packages.ntop.org/centos/9/noarch/Packages/) Please try with this version which includes improvements to support latest kernels.

I have tried the update and it crashed. I will investigate the problem by getting different sourced PCI card (will take time). I will let you know about the progress.

fandigunawan avatar May 24 '25 01:05 fandigunawan

Did you also try without ZC? Does it run? (i.e. pfcount -i ens1f0)

cardigliano avatar May 26 '25 07:05 cardigliano

Did you also try without ZC? Does it run? (i.e. pfcount -i ens1f0)

It is okay if I run it without ZC.

fandigunawan avatar May 27 '25 01:05 fandigunawan