atop icon indicating copy to clipboard operation
atop copied to clipboard

Netatop crashes the kernel with General Protection Fault

Open ValdikSS opened this issue 10 months ago • 3 comments

Hello,

Netatop 3.1 module crashes my server once in several days, with General Protection fault. Take a look at the most recent crash log obtained with netconsole. It crashes in analyze_tcpv4_packet - sock2task - get_taskinfo.

The most recent crash (spoiler)
[206201.363307] general protection fault, probably for non-canonical address 0xbe27f590f0ab0657: 0000 [#1] PREEMPT SMP PTI
[206201.363318] CPU: 0 PID: 310615 Comm: eiskaltdcpp-qt Tainted: G        W  OE      6.6.27-1-lts #1 d5b6011e73704a95088e0244d141560bd5ec914b
[206201.363323] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018
[206201.363326] RIP: 0010:kmem_cache_alloc+0x115/0x370
[206201.363334] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00
[206201.363338] RSP: 0018:ffffb5340c80b700 EFLAGS: 00010092
[206201.363342] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04b3e8139ea
[206201.363345] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f
[206201.363349] RBP: ffffb5340c80b750 R08: be27f590f0ab061f R09: 00000000004bc400
[206201.363351] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00
[206201.363354] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078
[206201.363357] FS:  000073831ddfb6c0(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000
[206201.363361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[206201.363363] CR2: 000073831ddf7e78 CR3: 000000012c6aa001 CR4: 00000000001706f0
[206201.363366] Call Trace:
[206201.363369]  <TASK>
[206201.363372]  ? die_addr+0x36/0x90
[206201.363377]  ? exc_general_protection+0x1c5/0x430
[206201.363382]  ? asm_exc_general_protection+0x26/0x30
[206201.363387]  ? kmem_cache_alloc+0x115/0x370
[206201.363392]  ? get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.363400]  get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.363406]  sock2task+0x1fe/0x380 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.363413]  analyze_tcpv4_packet+0x1be/0x210 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.363420]  ipv4_hookout+0xa5/0xe0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.363427]  nf_hook_slow+0x45/0xc0
[206201.363433]  __ip_local_out+0xfa/0x180
[206201.363438]  ? __pfx_dst_output+0x10/0x10
[206201.363441]  ip_local_out+0x1b/0x70
[206201.363445]  __ip_queue_xmit+0x175/0x490
[206201.363448]  __tcp_transmit_skb+0xa5e/0xbf0
[206201.363454]  tcp_connect+0xb37/0xeb0
[206201.363458]  ? __pfx___inet_check_established+0x10/0x10
[206201.363463]  tcp_v4_connect+0x419/0x500
[206201.363467]  __inet_stream_connect+0x112/0x3d0
[206201.363472]  inet_stream_connect+0x3a/0x60
[206201.363476]  __sys_connect+0xa8/0xd0
[206201.363480]  __x64_sys_connect+0x18/0x20
[206201.363484]  do_syscall_64+0x5a/0x80
[206201.363489]  ? __slab_free+0xf1/0x380
[206201.363493]  ? __unfreeze_partials+0x1c1/0x210
[206201.363497]  ? __mod_memcg_lruvec_state+0x4e/0xa0
[206201.363501]  ? skb_release_data+0x142/0x1c0
[206201.363506]  ? rtl8169_poll+0x442/0x4e0 [r8169 84aff28b94f8fe3441c84c217bd59057a09d2ae4]
[206201.363516]  ? __napi_poll+0x2b/0x1b0
[206201.363519]  ? net_rx_action+0x19e/0x370
[206201.363522]  ? sched_clock+0x10/0x30
[206201.363525]  ? sched_clock_cpu+0xf/0x190
[206201.363530]  ? irqtime_account_irq+0x40/0xc0
[206201.363533]  ? __do_softirq+0x186/0x2c8
[206201.363537]  ? __irq_exit_rcu+0x4b/0xc0
[206201.363542]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[206201.363546] RIP: 0033:0x738396d2879b
[206201.363567] Code: 83 ec 18 89 54 24 0c 48 89 34 24 89 7c 24 08 e8 fb cf f7 ff 8b 54 24 0c 48 8b 34 24 41 89 c0 8b 7c 24 08 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 08 e8 51 d0 f7 ff 8b 44
[206201.363571] RSP: 002b:000073831ddfab70 EFLAGS: 00000293 ORIG_RAX: 000000000000002a
[206201.363575] RAX: ffffffffffffffda RBX: 0000738374058b58 RCX: 0000738396d2879b
[206201.363578] RDX: 0000000000000010 RSI: 000073831ddfab90 RDI: 0000000000000016
[206201.363580] RBP: 000073837402e700 R08: 0000000000000000 R09: 0000000000000000
[206201.363583] R10: 0000738396d9efe0 R11: 0000000000000293 R12: 000073831ddfab90
[206201.363586] R13: 000073831ddfaba0 R14: 0000738374058b58 R15: 0000738397ca7b60
[206201.363589]  </TASK>
[206201.363591] Modules linked in: tls bluetooth ecdh_generic mptcp_diag vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag netconsole nf_conntrack_netlink xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill iptable_mangle iptable_filter iptable_nat zram nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_intel mei_pxp irqbypass spi_nor mtd crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic at24 mei_hdcp
[206201.363633]  iTCO_wdt spi_intel_platform spi_intel intel_pmc_bxt gf128mul snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi snd_hda_codec ghash_clmulni_intel r8169 sha512_ssse3 sha256_ssse3 snd_hda_core sha1_ssse3 aesni_intel crypto_simd cryptd mxm_wmi snd_hwdep rapl intel_cstate realtek mdio_devres mei_me alx snd_pcm intel_uncore lpc_ich i2c_i801 libphy mei i2c_smbus snd_timer snd soundcore mdio mac_hid tcp_bbr netatop(OE) sg crypto_user loop fuse dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy ttm intel_gtt xhci_pci crc32c_intel drm_display_helper xhci_pci_renesas cec video wmi
[206201.363755] ---[ end trace 0000000000000000 ]---
[206201.363758] RIP: 0010:kmem_cache_alloc+0x115/0x370
[206201.363763] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00
[206201.363767] RSP: 0018:ffffb5340c80b700 EFLAGS: 00010092
[206201.363770] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04b3e8139ea
[206201.363773] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f
[206201.363776] RBP: ffffb5340c80b750 R08: be27f590f0ab061f R09: 00000000004bc400
[206201.363779] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00
[206201.363782] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078
[206201.363785] FS:  000073831ddfb6c0(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000
[206201.363788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[206201.363791] CR2: 000073831ddf7e78 CR3: 000000012c6aa001 CR4: 00000000001706f0
[206201.363794] note: eiskaltdcpp-qt[310615] exited with irqs disabled
[206201.363862] note: eiskaltdcpp-qt[310615] exited with preempt_count 2
[206201.415487] general protection fault, probably for non-canonical address 0xbe27f590f0ab0657: 0000 [#2] PREEMPT SMP PTI
[206201.415497] CPU: 0 PID: 310603 Comm: snowflake Tainted: G      D W  OE      6.6.27-1-lts #1 d5b6011e73704a95088e0244d141560bd5ec914b
[206201.415503] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018
[206201.415506] RIP: 0010:kmem_cache_alloc+0x115/0x370
[206201.415514] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00
[206201.415519] RSP: 0018:ffffb5340c85b8e0 EFLAGS: 00010092
[206201.415523] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04d97e322d0
[206201.415526] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f
[206201.415529] RBP: ffffb5340c85b930 R08: be27f590f0ab061f R09: 00000000004bc400
[206201.415532] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00
[206201.415535] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078
[206201.415538] FS:  000000c000201490(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000
[206201.415542] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[206201.415602] CR2: 00007c086e930180 CR3: 0000000105504006 CR4: 00000000001706f0
[206201.415606] Call Trace:
[206201.415609]  <TASK>
[206201.415612]  ? die_addr+0x36/0x90
[206201.415618]  ? exc_general_protection+0x1c5/0x430
[206201.415623]  ? asm_exc_general_protection+0x26/0x30
[206201.415628]  ? kmem_cache_alloc+0x115/0x370
[206201.415634]  ? get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.415642]  get_taskinfo+0xa5/0x1b0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.415649]  sock2task+0x16b/0x380 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.415656]  analyze_tcpv4_packet+0x1be/0x210 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.415663]  ipv4_hookout+0xa5/0xe0 [netatop 0d0b3365e311e316aec2c30b46f57b19b45eca39]
[206201.415669]  nf_hook_slow+0x45/0xc0
[206201.415676]  __ip_local_out+0xfa/0x180
[206201.415680]  ? __pfx_dst_output+0x10/0x10
[206201.415684]  ip_local_out+0x1b/0x70
[206201.415688]  __ip_queue_xmit+0x175/0x490
[206201.415691]  __tcp_transmit_skb+0xa5e/0xbf0
[206201.415697]  tcp_v4_do_rcv+0x151/0x280
[206201.415701]  __release_sock+0xb8/0xd0
[206201.415705]  release_sock+0x2f/0x90
[206201.415709]  tcp_recvmsg+0x92/0x1f0
[206201.415714]  inet_recvmsg+0x56/0x130
[206201.415718]  ? __pfx_bpf_lsm_socket_recvmsg+0x10/0x10
[206201.415722]  ? security_socket_recvmsg+0x44/0x70
[206201.415726]  sock_recvmsg+0xa6/0xd0
[206201.415730]  sock_read_iter+0x96/0x100
[206201.415733]  vfs_read+0x303/0x350
[206201.415738]  ksys_read+0xbb/0xf0
[206201.415741]  do_syscall_64+0x5a/0x80
[206201.415746]  ? syscall_exit_to_user_mode+0x22/0x40
[206201.415750]  ? do_syscall_64+0x66/0x80
[206201.415754]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[206201.415758] RIP: 0033:0x40720e
[206201.415780] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[206201.415785] RSP: 002b:000000c000373c28 EFLAGS: 00000206 ORIG_RAX: 0000000000000000
[206201.415789] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 000000000040720e
[206201.415792] RDX: 0000000000008000 RSI: 000000c000164000 RDI: 0000000000000008
[206201.415795] RBP: 000000c000373c68 R08: 0000000000000000 R09: 0000000000000000
[206201.415797] R10: 0000000000000000 R11: 0000000000000206 R12: 000000c00051beb0
[206201.415800] R13: 000000c00023ef12 R14: 000000c0000076c0 R15: 0000000000000000
[206201.415804]  </TASK>
[206201.415806] Modules linked in: tls bluetooth ecdh_generic mptcp_diag vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag netconsole nf_conntrack_netlink xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill iptable_mangle iptable_filter iptable_nat zram nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_intel mei_pxp irqbypass spi_nor mtd crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic at24 mei_hdcp
[206201.415853]  iTCO_wdt spi_intel_platform spi_intel intel_pmc_bxt gf128mul snd_intel_dspcfg iTCO_vendor_support snd_intel_sdw_acpi snd_hda_codec ghash_clmulni_intel r8169 sha512_ssse3 sha256_ssse3 snd_hda_core sha1_ssse3 aesni_intel crypto_simd cryptd mxm_wmi snd_hwdep rapl intel_cstate realtek mdio_devres mei_me alx snd_pcm intel_uncore lpc_ich i2c_i801 libphy mei i2c_smbus snd_timer snd soundcore mdio mac_hid tcp_bbr netatop(OE) sg crypto_user loop fuse dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy ttm intel_gtt xhci_pci crc32c_intel drm_display_helper xhci_pci_renesas cec video wmi
[206201.415909] ---[ end trace 0000000000000000 ]---
[206201.415912] RIP: 0010:kmem_cache_alloc+0x115/0x370
[206201.415917] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00
[206201.415922] RSP: 0018:ffffb5340c80b700 EFLAGS: 00010092
[206201.415926] RAX: be27f590f0ab0657 RBX: be535888bc8ade5e RCX: ffffa04b3e8139ea
[206201.415929] RDX: 00000000004bc400 RSI: 0000000000000820 RDI: be27f590f0ab061f
[206201.415931] RBP: ffffb5340c80b750 R08: be27f590f0ab061f R09: 00000000004bc400
[206201.415934] R10: 000034e630a11a20 R11: 0000000000000000 R12: ffffa04ac153fa00
[206201.415937] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078
[206201.415940] FS:  000000c000201490(0000) GS:ffffa04dcf200000(0000) knlGS:0000000000000000
[206201.415944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[206201.415947] CR2: 00007c086e930180 CR3: 0000000105504006 CR4: 00000000001706f0
[206201.415950] note: snowflake[310603] exited with irqs disabled
[206201.416000] note: snowflake[310603] exited with preempt_count 2

And here's another, older one

Another crash (spoiler)
[45470.068801] general protection fault, probably for non-canonical address 0xb1700553f83edc8f: 0000 [#1] PREEMPT SMP PTI
[45470.068811] CPU: 0 PID: 84945 Comm: eiskaltdcpp-qt Tainted: G           OE      6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b
[45470.068816] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018
[45470.068820] RIP: 0010:kmem_cache_alloc+0x115/0x370
[45470.068826] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00
[45470.068831] RSP: 0018:ffffadbe0cf3fa90 EFLAGS: 00010082
[45470.068834] RAX: b1700553f83edc8f RBX: 06b83e4ab7a12ba8 RCX: ffff9f4f3a31c43a
[45470.068847] RDX: 00000000002c3e00 RSI: 0000000000000820 RDI: b1700553f83edc57
[45470.068850] RBP: ffffadbe0cf3fae0 R08: b1700553f83edc57 R09: 00000000002c3e00
[45470.068853] R10: 00002e6c70a0d860 R11: 0000000000000001 R12: ffff9f4e84b91c00
[45470.068855] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078
[45470.068858] FS:  0000764c2cbea700(0000) GS:ffff9f518f200000(0000) knlGS:0000000000000000
[45470.068861] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45470.068864] CR2: 0000764c4d193c88 CR3: 00000002af58a002 CR4: 00000000001706f0
[45470.068867] Call Trace:
[45470.068870]  <TASK>
[45470.068873]  ? die_addr+0x36/0x90
[45470.068878]  ? exc_general_protection+0x1c5/0x430
[45470.068884]  ? asm_exc_general_protection+0x26/0x30
[45470.068889]  ? kmem_cache_alloc+0x115/0x370
[45470.068894]  ? get_taskinfo+0xa5/0x1b0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45470.068901]  get_taskinfo+0xa5/0x1b0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45470.068908]  sock2task+0x1fe/0x380 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45470.069059] Modules linked in: xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock netconsole dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill zram iptable_mangle iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm irqbypass ledtrig_audio crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul spi_nor mtd iTCO_wdt spi_intel_platform at24 intel_pmc_bxt iTCO_vendor_support snd_hda_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3
[45470.068924]  analyze_tcpv4_packet+0x1be/0x210 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45470.068931]  ipv4_hookout+0xa5/0xe0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45470.068937]  nf_hook_slow+0x45/0xc0
[45470.068943]  __ip_local_out+0xfa/0x180
[45470.069111]  aesni_intel mei_pxp mei_hdcp spi_intel crypto_simd cryptd rapl intel_cstate snd_intel_dspcfg i2c_i801 snd_intel_sdw_acpi snd_hda_codec intel_uncore snd_hda_core r8169 i2c_smbus realtek mxm_wmi snd_hwdep snd_pcm lpc_ich snd_timer snd mei_me mdio_devres alx soundcore mei libphy mdio mac_hid tcp_bbr netatop(OE) sg crypto_user dm_mod loop fuse nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy crc32c_intel ttm intel_gtt xhci_pci drm_display_helper xhci_pci_renesas cec video wmi
[45470.069159] ---[ end trace 0000000000000000 ]---
[45470.068947]  ? __pfx_dst_output+0x10/0x10
[45470.069162] RIP: 0010:kmem_cache_alloc+0x115/0x370
[45470.069166] Code: 38 0f 84 e7 01 00 00 48 85 ff 0f 84 de 01 00 00 41 8b 44 24 28 4d 8b 14 24 49 89 f8 49 89 d1 49 8b 9c 24 b8 00 00 00 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 02 00 00
[45470.069170] RSP: 0018:ffffadbe0cf3fa90 EFLAGS: 00010082
[45470.068950]  ip_local_out+0x1b/0x70
[45470.069173] RAX: b1700553f83edc8f RBX: 06b83e4ab7a12ba8 RCX: ffff9f4f3a31c43a
[45470.069176] RDX: 00000000002c3e00 RSI: 0000000000000820 RDI: b1700553f83edc57
[45470.069178] RBP: ffffadbe0cf3fae0 R08: b1700553f83edc57 R09: 00000000002c3e00
[45470.068953]  __ip_queue_xmit+0x175/0x490
[45470.069181] R10: 00002e6c70a0d860 R11: 0000000000000001 R12: ffff9f4e84b91c00
[45470.069183] R13: 0000000000000000 R14: 0000000000000820 R15: 0000000000000078
[45470.068957]  __tcp_transmit_skb+0xa5e/0xbf0
[45470.069186] FS:  0000764c2cbea700(0000) GS:ffff9f518f200000(0000) knlGS:0000000000000000
[45470.069189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45470.068963]  tcp_write_xmit+0x544/0x14e0
[45470.069191] CR2: 0000764c4d193c88 CR3: 00000002af58a002 CR4: 00000000001706f0
[45470.069194] note: eiskaltdcpp-qt[84945] exited with irqs disabled
[45470.068967]  __tcp_push_pending_frames+0x36/0xf0
[45470.068971]  inet_shutdown+0xe2/0xf0
[45470.068974]  __sys_shutdown+0x60/0xb0
[45470.068979]  __x64_sys_shutdown+0x14/0x20
[45470.068982]  do_syscall_64+0x60/0x90
[45470.068985]  ? __do_softirq+0x186/0x2c8
[45470.068989]  ? __irq_exit_rcu+0x4b/0xc0
[45470.068993]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[45470.068997] RIP: 0033:0x764c719290c7
[45470.069214] note: eiskaltdcpp-qt[84945] exited with preempt_count 2
[45470.069026] Code: f0 ff ff 73 01 c3 48 8b 0d c6 0d 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 30 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 0d 0d 00 f7 d8 64 89 01 48
[45470.069030] RSP: 002b:0000764c2cbe9cb8 EFLAGS: 00000217 ORIG_RAX: 0000000000000030
[45470.069044] RAX: ffffffffffffffda RBX: 0000764c48495b30 RCX: 0000764c719290c7
[45470.069046] RDX: 0000764c71762590 RSI: 0000000000000002 RDI: 0000000000000035
[45470.069049] RBP: 0000764c482bb0b0 R08: 0000764c719fb9c0 R09: 0000000000000000
[45470.069051] R10: 0000000000000001 R11: 0000000000000217 R12: 0000764c2cbe9d70
[45470.069054] R13: 0000764c717b4a8d R14: 0000764c2cbe9d70 R15: 0000764c2cbe9d80
[45470.069058]  </TASK>
[45511.480830] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kcompactd0:48]
[45511.480898] Modules linked in: xt_conntrack nft_chain_nat xt_addrtype xt_owner nft_compat dummy ip6table_raw ip6t_rpfilter iptable_raw ipt_rpfilter veth xt_CHECKSUM xt_tcpudp xt_comment xt_MASQUERADE ip6table_nat ip6table_mangle ip6table_filter ip6_tables bridge stp llc btrfs blake2b_generic xor raid6_pq nf_tables vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock netconsole dm_crypt cbc encrypted_keys trusted asn1_encoder tee tun rfkill zram iptable_mangle iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nfnetlink_queue nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic kvm irqbypass ledtrig_audio crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul spi_nor mtd iTCO_wdt spi_intel_platform at24 intel_pmc_bxt iTCO_vendor_support snd_hda_intel ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3
[45511.480940]  aesni_intel mei_pxp mei_hdcp spi_intel crypto_simd cryptd rapl intel_cstate snd_intel_dspcfg i2c_i801 snd_intel_sdw_acpi snd_hda_codec intel_uncore snd_hda_core r8169 i2c_smbus realtek mxm_wmi snd_hwdep snd_pcm lpc_ich snd_timer snd mei_me mdio_devres alx soundcore mei libphy mdio mac_hid tcp_bbr netatop(OE) sg crypto_user dm_mod loop fuse nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 i915 i2c_algo_bit drm_buddy crc32c_intel ttm intel_gtt xhci_pci drm_display_helper xhci_pci_renesas cec video wmi
[45511.480992] CPU: 2 PID: 48 Comm: kcompactd0 Tainted: G      D    OE      6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b
[45511.480998] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018
[45511.481001] RIP: 0010:smp_call_function_many_cond+0x12b/0x500
[45511.481009] Code: df e8 49 fc 48 00 3b 05 83 aa e5 01 73 26 48 63 d0 49 8b 34 24 48 03 34 d5 e0 4c 70 b8 8b 56 08 83 e2 01 74 0a f3 90 8b 4e 08 <83> e1 01 75 f6 83 c0 01 eb c1 48 83 c4 50 5b 5d 41 5c 41 5d 41 5e
[45511.481015] RSP: 0018:ffffadbe001c79c8 EFLAGS: 00000202
[45511.481018] RAX: 0000000000000000 RBX: ffff9f518f335508 RCX: 0000000000000011
[45511.481022] RDX: 0000000000000001 RSI: ffff9f518f23b300 RDI: ffff9f518f335508
[45511.481025] RBP: ffff9f518f335500 R08: ffff9f518f335508 R09: 0000000000000000
[45511.481028] R10: ffff9f518f335530 R11: 0000000000000000 R12: ffff9f518f335500
[45511.481031] R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000002
[45511.481034] FS:  0000000000000000(0000) GS:ffff9f518f300000(0000) knlGS:0000000000000000
[45511.481038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45511.481041] CR2: 000072350ee3a8e0 CR3: 000000029b58e003 CR4: 00000000001706e0
[45511.481046] Call Trace:
[45511.481049]  <IRQ>
[45511.481054]  ? watchdog_timer_fn+0x1b8/0x220
[45511.481058]  ? __pfx_watchdog_timer_fn+0x10/0x10
[45511.481171]  kthread+0xe8/0x120
[45511.481175]  ? __pfx_kthread+0x10/0x10
[45511.481062]  ? __hrtimer_run_queues+0x112/0x2b0
[45511.481180]  ret_from_fork+0x34/0x50
[45511.481067]  ? hrtimer_interrupt+0xf8/0x230
[45511.481184]  ? __pfx_kthread+0x10/0x10
[45511.481188]  ret_from_fork_asm+0x1b/0x30
[45511.481194]  </TASK>
[45511.481071]  ? __sysvec_apic_timer_interrupt+0x50/0x140
[45511.481077]  ? sysvec_apic_timer_interrupt+0x6d/0x90
[45511.481081]  </IRQ>
[45511.481083]  <TASK>
[45511.481086]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[45511.481092]  ? smp_call_function_many_cond+0x12b/0x500
[45511.481095]  ? smp_call_function_many_cond+0x107/0x500
[45511.481099]  ? __pfx_invalidate_bh_lru+0x10/0x10
[45511.481104]  on_each_cpu_cond_mask+0x24/0x40
[45511.481109]  __buffer_migrate_folio+0xf8/0x2a0
[45511.481115]  move_to_new_folio+0x53/0x140
[45511.481119]  migrate_pages_batch+0x8e5/0xca0
[45511.481123]  ? __pfx_compaction_free+0x10/0x10
[45511.481127]  ? __pfx_remove_migration_pte+0x10/0x10
[45511.481131]  ? __pfx_compaction_alloc+0x10/0x10
[45511.481135]  migrate_pages+0xb41/0xe00
[45511.481138]  ? __pfx_compaction_free+0x10/0x10
[45511.481142]  ? __pfx_compaction_alloc+0x10/0x10
[45511.481145]  ? __pfx_compaction_alloc+0x10/0x10
[45511.481149]  compact_zone+0x831/0xf20
[45511.481154]  proactive_compact_node+0x85/0xe0
[45511.481158]  kcompactd+0x35b/0x430
[45511.481162]  ? __pfx_autoremove_wake_function+0x10/0x10
[45511.481167]  ? __pfx_kcompactd+0x10/0x10
[45532.577883] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[45532.577949] rcu: 	1-...!: (0 ticks this GP) idle=c1b4/1/0x4000000000000000 softirq=9952024/9952024 fqs=237
[45532.577957] rcu: 	(detected by 2, t=18006 jiffies, g=15878461, q=9524 ncpus=4)
[45532.577963] Sending NMI from CPU 2 to CPUs 1:
[45532.577971] NMI backtrace for cpu 1
[45532.577973] CPU: 1 PID: 226 Comm: knetatop Tainted: G      D    OEL     6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b
[45532.577975] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018
[45532.577976] RIP: 0010:native_queued_spin_lock_slowpath+0x6e/0x2e0
[45532.577979] Code: 77 7f f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 5b 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 65 48 ff 05 a3 00 26
[45532.577980] RSP: 0018:ffffadbe0031fe88 EFLAGS: 00000002
[45532.577982] RAX: 0000000000000001 RBX: ffffffffc0acdbd8 RCX: 0000000000000000
[45532.577983] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffc0acdbd8
[45532.577983] RBP: ffffffffc0acdbc8 R08: 0000000000000082 R09: 0000000080240023
[45532.577984] R10: 0000000000000000 R11: 0000000000000046 R12: ffffffffc0acdb50
[45532.577985] R13: 0000000000000082 R14: 0000000000000287 R15: ffffffffc0acdbd8
[45532.577986] FS:  0000000000000000(0000) GS:ffff9f518f280000(0000) knlGS:0000000000000000
[45532.577987] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45532.577988] CR2: 0000771e325ae1a0 CR3: 000000000c6de004 CR4: 00000000001706e0
[45532.577988] Call Trace:
[45532.577990]  <NMI>
[45532.577991]  ? nmi_cpu_backtrace+0x99/0x110
[45532.577994]  ? nmi_cpu_backtrace_handler+0x11/0x20
[45532.577997]  ? nmi_handle+0x61/0x150
[45532.577999]  ? default_do_nmi+0x40/0x100
[45532.578001]  ? exc_nmi+0x125/0x1a0
[45532.578002]  ? end_repeat_nmi+0x16/0x67
[45532.578006]  ? native_queued_spin_lock_slowpath+0x6e/0x2e0
[45532.578008]  ? native_queued_spin_lock_slowpath+0x6e/0x2e0
[45532.578010]  ? native_queued_spin_lock_slowpath+0x6e/0x2e0
[45532.578011]  </NMI>
[45532.578011]  <TASK>
[45532.578012]  _raw_spin_lock_irqsave+0x3d/0x50
[45532.578014]  garbage_collector+0x66/0x3c0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45532.578019]  ? __pfx_netatop_thread+0x10/0x10 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45532.578023]  netatop_thread+0x10/0x30 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45532.578027]  kthread+0xe8/0x120
[45532.578029]  ? __pfx_kthread+0x10/0x10
[45532.578031]  ret_from_fork+0x34/0x50
[45532.578033]  ? __pfx_kthread+0x10/0x10
[45532.578035]  ret_from_fork_asm+0x1b/0x30
[45532.578038]  </TASK>
[45532.578969] rcu: rcu_preempt kthread timer wakeup didn't happen for 17293 jiffies! g15878461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[45532.579122] rcu: 	Possible timer handling issue on cpu=0 timer-softirq=5779617
[45532.579127] rcu: rcu_preempt kthread starved for 17295 jiffies! g15878461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[45532.579134] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[45532.579139] rcu: RCU grace-period kthread stack dump:
[45532.579144] task:rcu_preempt     state:I stack:0     pid:18    ppid:2      flags:0x00004000
[45532.579151] Call Trace:
[45532.579155]  <TASK>
[45532.579160]  ? __pfx_rcu_gp_kthread+0x10/0x10
[45532.579167]  __schedule+0x3e7/0x1410
[45532.579174]  ? __pfx_rcu_gp_kthread+0x10/0x10
[45532.579180]  schedule+0x5e/0xd0
[45532.579186]  schedule_timeout+0x98/0x160
[45532.579192]  ? __pfx_process_timeout+0x10/0x10
[45532.579198]  rcu_gp_fqs_loop+0x107/0x560
[45532.579204]  rcu_gp_kthread+0xd4/0x190
[45532.579210]  kthread+0xe8/0x120
[45532.579216]  ? __pfx_kthread+0x10/0x10
[45532.579222]  ret_from_fork+0x34/0x50
[45532.579227]  ? __pfx_kthread+0x10/0x10
[45532.579233]  ret_from_fork_asm+0x1b/0x30
[45532.579240]  </TASK>
[45532.579245] rcu: Stack dump where RCU GP kthread last ran:
[45532.579249] Sending NMI from CPU 2 to CPUs 0:
[45532.579255] NMI backtrace for cpu 0
[45532.579257] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D    OEL     6.6.25-1-lts #1 d7280cdf80ca98da2597ab2da5b8ef8d06d3fe7b
[45532.579259] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Killer, BIOS P1.90 03/11/2018
[45532.579260] RIP: 0010:native_queued_spin_lock_slowpath+0x225/0x2e0
[45532.579263] Code: 41 c1 e4 10 41 c1 e5 12 45 09 ec 44 89 e0 c1 e8 10 66 87 43 02 89 c2 c1 e2 10 81 fa ff ff 00 00 77 5e 31 d2 eb 02 f3 90 8b 03 <66> 85 c0 75 f7 44 39 e0 0f 84 8e 00 00 00 c6 03 01 48 85 d2 74 0e
[45532.579264] RSP: 0018:ffffadbe00003d40 EFLAGS: 00000002
[45532.579265] RAX: 0000000000100101 RBX: ffffffffc0acdbd8 RCX: 00000000533970ee
[45532.579266] RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffffffffc0acdbd8
[45532.579267] RBP: ffff9f518f235040 R08: ffff9f4e9b57304e R09: ffff9f4e9b573062
[45532.579268] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000040000
[45532.579269] R13: 0000000000040000 R14: 0000000000000000 R15: 0000000000000069
[45532.579269] FS:  0000000000000000(0000) GS:ffff9f518f200000(0000) knlGS:0000000000000000
[45532.579271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[45532.579271] CR2: 00005e232cfbb0bc CR3: 00000002af57a004 CR4: 00000000001706f0
[45532.579272] Call Trace:
[45532.579273]  <NMI>
[45532.579274]  ? nmi_cpu_backtrace+0x99/0x110
[45532.579277]  ? nmi_cpu_backtrace_handler+0x11/0x20
[45532.579279]  ? nmi_handle+0x61/0x150
[45532.579282]  ? default_do_nmi+0x40/0x100
[45532.579283]  ? exc_nmi+0x125/0x1a0
[45532.579284]  ? end_repeat_nmi+0x16/0x67
[45532.579287]  ? native_queued_spin_lock_slowpath+0x225/0x2e0
[45532.579289]  ? native_queued_spin_lock_slowpath+0x225/0x2e0
[45532.579291]  ? native_queued_spin_lock_slowpath+0x225/0x2e0
[45532.579292]  </NMI>
[45532.579293]  <IRQ>
[45532.579293]  _raw_spin_lock_irqsave+0x3d/0x50
[45532.579295]  analyze_tcpv4_packet+0x89/0x210 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45532.579300]  ipv4_hookin+0x96/0xd0 [netatop 02529030fbe77a5f61961260edaf50cd667dc966]
[45532.579304]  nf_hook_slow+0x45/0xc0
[45532.579308]  ip_local_deliver+0xd0/0x120
[45532.579310]  ? __pfx_ip_local_deliver_finish+0x10/0x10
[45532.579312]  __netif_receive_skb_one_core+0x89/0xa0
[45532.579316]  process_backlog+0x85/0x120
[45532.579318]  __napi_poll+0x2b/0x1b0
[45532.579320]  net_rx_action+0x2b5/0x370
[45532.579323]  __do_softirq+0xd4/0x2c8
[45532.579325]  __irq_exit_rcu+0xa3/0xc0
[45532.579328]  common_interrupt+0x86/0xa0
[45532.579330]  </IRQ>
[45532.579330]  <TASK>
[45532.579331]  asm_common_interrupt+0x26/0x40
[45532.579332] RIP: 0010:cpuidle_enter_state+0xcc/0x440
[45532.579335] Code: da 75 38 ff e8 d5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 b3 76 37 ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[45532.579336] RSP: 0018:ffffffffb8e03e28 EFLAGS: 00000246
[45532.579336] RAX: ffff9f518f2341c0 RBX: ffff9f518f23dcc0 RCX: 000000000000001f
[45532.579337] RDX: 0000000000000000 RSI: 000000002802f942 RDI: 0000000000000000
[45532.579338] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000000067
[45532.579338] R10: 0000000000000014 R11: ffff9f518f232ba4 R12: ffffffffb8f47ea0
[45532.579339] R13: 0000295bf4e52e5b R14: 0000000000000001 R15: 0000000000000000
[45532.579341]  ? cpuidle_enter_state+0xbd/0x440
[45532.579343]  cpuidle_enter+0x2d/0x40
[45532.579346]  do_idle+0x1d8/0x230
[45532.579349]  cpu_startup_entry+0x2a/0x30
[45532.579351]  rest_init+0xca/0xd0
[45532.579352]  arch_call_rest_init+0xe/0x30
[45532.579355]  start_kernel+0x704/0xa90
[45532.579357]  x86_64_start_reservations+0x18/0x30
[45532.579360]  x86_64_start_kernel+0x96/0xa0
[45532.579362]  secondary_startup_64_no_verify+0x18f/0x19b
[45532.579366]  </TASK>

Tested on different 6.6 kernels. The crashes start happen in a day I've installed netatop (5 April), and for some reason I've missed "netatop" string in the stack trace, noticing it only today, after today's crash. The server gets rebooted (and I also have hardware watchdog), so it's either full crash or CPU deadlock.

I'm running ArchLinux, used netatop-dkms (3.1-2) module (netatop-3.1.tar.gz)

The contact form on the website does not work. It returns HTTP 500 upon submitting.

ValdikSS avatar Apr 20 '24 07:04 ValdikSS

I fixed the contact form on the website. Thanks for warning.

glangeveld avatar May 07 '24 19:05 glangeveld

I also created a test version of the netatop module which you can download (Update: removed now). Could you please test this version?

glangeveld avatar May 07 '24 19:05 glangeveld

@glangeveld, from what I could see by diffing 3.2.1 and 3.1, only one lock is added which doesn't seem relevant. I'll try to test it but I can only test it on my main machine which I'm not a fan when it hangs.

UPD: Thu May 16 19:54:00 2024 compiled and loaded the module.

ValdikSS avatar May 15 '24 16:05 ValdikSS

Its May 24, so far so good, @glangeveld.

ValdikSS avatar May 24 '24 17:05 ValdikSS

Thanks for testing! The modification does certainly not only concern the addition of a lock (but even that could have solved an issue). There was a race condition in the garbage collection that has been solved by adding a reference count. I will release a new version as soon as possible.

Atoptool avatar May 26 '24 09:05 Atoptool

Version 3.2.2 can be downloaded from here.

Atoptool avatar Jun 01 '24 10:06 Atoptool