AKS
AKS copied to clipboard
[BUG] linux-azure kernel segfault in netfs module, netfs_rreq_unlock causes kernel panic on nodes
Describe the bug This is a segmentation fault which exists in the netfs module of the linux-azure kernel (5.15.0-1075-azure). This was fixed in a later version, but not patched in the current AKS vm image. We've observed it on nodes with the cephfs module loaded.
To Reproduce Steps to reproduce the behavior:
- Load cephfs kernel module (may use rook-ceph provisioner).
- Unknown system load or time characteristic. (may be correlated with high number of disk read operations but that's not confirmed)
- Kernel panic shows in boot diagnostics for vmss instance, stateful workloads will experience ~5-10 minutes of downtime.
Expected behavior
Correct handling of XA_RETRY_ENTRY so that address 0000000000000402 is not dereferenced.
via https://github.com/torvalds/linux/blob/v5.15/fs/netfs/read_helper.c#L406 : On or after the first iteration of netfs_rreq_unlock, page can have the value XA_RETRY_ENTRY (returned by xas_find() in xas_for_each), which needs to be properly handled.
Screenshots
[87534.602454] BUG: kernel NULL pointer dereference, address: 0000000000000402
[87534.606859] #PF: supervisor read access in kernel mode
[87534.609959] #PF: error_code(0x0000) - not-present page
[87534.613243] PGD 0 P4D 0
[87534.615278] Oops: 0000 [#1] SMP NOPTI
[87534.617686] CPU: 4 PID: 2688731 Comm: kworker/4:2 Not tainted 5.15.0-1075-azure #84-Ubuntu
[87534.622366] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 08/23/2024
[87534.628290] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[87534.632247] RIP: 0010:netfs_rreq_unlock+0xf7/0x3c0 [netfs]
[87534.635624] Code: 7d a0 4c 89 fe 45 31 f6 e8 c6 7d c8 ed 4c 8b 45 90 48 85 c0 48 89 c7 0f 84 33 01 00 00 4d 8d 48 50 4c 89 fa 45 31 e4 4d 89 cf <48> 8b 0f 48 8b 47 20 48 2b 45 98 48 c1 e9 10 c1 e0 0c 83 e1 01 80
[87534.645989] RSP: 0018:ffffb1cc927bfac0 EFLAGS: 00010246
...
6.8.0-1018-azure/kernel/fs/netfs/netfs.ko (correct handling of signal, taken from another VM, non-AKS):
5.15.0-1075-azure/kernel/fs/netfs/netfs.ko (segfault exists):
Environment:
- azure-cli 2.60.0
- Kubernetes 1.28.x
- Linux 5.15.0-1075-azure
- aks ubuntu 22.04
- AKSUbuntu-2204gen2containerd-202410.09.0
- AKSUbuntu-2204gen2containerd-202412.10.0
Additional context ~~https://ubuntu.com/security/CVE-2023-52582~~ Not related.
https://access.redhat.com/solutions/6993035
https://github.com/torvalds/linux/commit/7e043a80b5dae5c2d2cf84031501de7827fd6c00
https://lore.kernel.org/all/166757987929.950645.12595273010425381286.stgit@warthog.procyon.org.uk/
Wrong image, was using AKSUbuntu-2204gen2containerd-202410.09.0 and earlier at the time of these incidents, updated information in bug report. The kernel version in this image is the same as in 202412.10.0, but since 202410.09.0 is a depreciated image (and due to the fact that I haven't experienced any issues with 202412.10.0 so far), I'd say its fine to close this after a while. This was a really unpredictable issue that happened around the holidays, so I'm not sure when - or if - it'll show up again.
(and due to the fact that I haven't experienced any issues with 202412.10.0 so far)
Just confirmed the same segfault this past weekend, which does make sense because the kernel wasn't patched in 202412.10.0 . Full segfault below. It's safe to assume this will continue to happen until the kernel is patched or a different version is used for aks' images.
[727545.334935] BUG: kernel NULL pointer dereference, address: 0000000000000402
[727545.346441] #PF: supervisor read access in kernel mode
[727545.351151] #PF: error_code(0x0000) - not-present page
[727545.356023] PGD 143d0e067 P4D 143d0e067 PUD 0
[727545.360903] Oops: 0000 [#1] SMP NOPTI
[727545.364506] CPU: 6 PID: 4100821 Comm: php-fpm Tainted: G W 5.15.0-1075-azure #84-Ubuntu
[727545.373163] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 08/23/2024
[727545.382146] RIP: 0010:netfs_rreq_unlock+0xf7/0x3c0 [netfs]
[727545.387111] Code: 7d a0 4c 89 fe 45 31 f6 e8 c6 cd 74 fa 4c 8b 45 90 48 85 c0 48 89 c7 0f 84 33 01 00 00 4d 8d 48 50 4c 89 fa 45 31 e4 4d 89 cf <48> 8b 0f 48 8b 47 20 48 2b 45 98 48 c1 e9 10 c1 e0 0c 83 e1 01 80
[727545.410806] RSP: 0018:ffff9e498ef07978 EFLAGS: 00010246
[727545.417332] RAX: 0000000000000402 RBX: ffff91342192e9c0 RCX: 0000000000000000
[727545.426170] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000402
[727545.432419] RBP: ffff9e498ef07a10 R08: ffff913351b58d80 R09: ffff913351b58dd0
[727545.440987] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[727545.447472] R13: 0000000000000000 R14: 0000000000000000 R15: ffff913351b58dd0
[727545.456435] FS: 00007f2e0c518b28(0000) GS:ffff913a9fd80000(0000) knlGS:0000000000000000
[727545.463387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[727545.468512] CR2: 0000000000000402 CR3: 000000012a63e004 CR4: 0000000000370ee0
[727545.475027] Call Trace:
[727545.477858] <TASK>
[727545.480403] ? srso_alias_return_thunk+0x5/0x7f
[727545.484822] ? show_trace_log_lvl+0x28e/0x2ea
[727545.488745] ? show_trace_log_lvl+0x28e/0x2ea
[727545.492660] ? netfs_rreq_assess+0xab/0x1f0 [netfs]
[727545.497592] ? show_regs.part.0+0x23/0x29
[727545.501655] ? __die_body.cold+0x8/0xd
[727545.505244] ? __die+0x2b/0x37
[727545.508176] ? page_fault_oops+0x13b/0x170
[727545.514960] ? do_user_addr_fault+0x302/0x630
[727545.519674] ? sched_clock+0x9/0x10
[727545.526255] ? srso_alias_return_thunk+0x5/0x7f
[727545.531191] ? sched_clock_cpu+0x12/0xf0
[727545.536925] ? srso_alias_return_thunk+0x5/0x7f
[727545.542295] ? exc_page_fault+0x71/0x160
[727545.546061] ? asm_exc_page_fault+0x27/0x30
[727545.550350] ? netfs_rreq_unlock+0xf7/0x3c0 [netfs]
[727545.558196] ? netfs_rreq_unlock+0xda/0x3c0 [netfs]
[727545.563304] ? srso_alias_return_thunk+0x5/0x7f
[727545.567718] ? raw_spin_rq_unlock+0x10/0x30
[727545.571566] ? srso_alias_return_thunk+0x5/0x7f
[727545.576258] netfs_rreq_assess+0xab/0x1f0 [netfs]
[727545.581962] netfs_readpage+0x182/0x3a0 [netfs]
[727545.586659] ? init_wait_var_entry+0x60/0x60
[727545.591207] ceph_readpage+0xb6/0x100 [ceph]
[727545.595320] filemap_read_page+0x38/0x100
[727545.599160] filemap_get_pages+0x2f5/0x3f0
[727545.603016] filemap_read+0xbc/0x3e0
[727545.606314] ? ceph_get_caps+0xed/0x5e0 [ceph]
[727545.610344] ? srso_alias_return_thunk+0x5/0x7f
[727545.615160] ? ceph_mdsc_release_request+0x176/0x190 [ceph]
[727545.620745] ? srso_alias_return_thunk+0x5/0x7f
[727545.624881] generic_file_read_iter+0xe2/0x150
[727545.629683] ceph_read_iter+0x180/0x610 [ceph]
[727545.633770] new_sync_read+0x10d/0x190
[727545.637152] ? new_sync_read+0x10d/0x190
[727545.640604] ? trace_pid_list_first+0x30/0x40
[727545.644967] vfs_read+0x106/0x1a0
[727545.648021] ksys_read+0x67/0xf0
[727545.650947] __x64_sys_read+0x19/0x20
[727545.654315] x64_sys_call+0x1dba/0x1fa0
[727545.659550] do_syscall_64+0x56/0xb0
[727545.667126] ? srso_alias_return_thunk+0x5/0x7f
[727545.671032] ? irqentry_exit_to_user_mode+0x10/0x30
[727545.675424] ? srso_alias_return_thunk+0x5/0x7f
[727545.681295] ? irqentry_exit+0x1d/0x30
[727545.684690] ? srso_alias_return_thunk+0x5/0x7f
[727545.688638] ? sysvec_hyperv_stimer0+0x4e/0x90
[727545.692407] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
[727545.698693] RIP: 0033:0x7f2e0c4d5992
[727545.701825] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 00 55 48 89 f5 be 00 88 08 00
[727545.720757] RSP: 002b:00007fff742458c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[727545.727047] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e0c4d5992
[727545.733026] RDX: 0000000000002000 RSI: 00007f2e0710c000 RDI: 0000000000000009
[727545.739078] RBP: 00007f2e0c518b28 R08: 0000000000000000 R09: 0000000000000000
[727545.745368] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2e070b1620
[727545.754626] R13: 0000000000002000 R14: 00007f2e0710c000 R15: 00000000443cfe78
[727545.760784] </TASK>
[727545.762882] Modules linked in: xt_nat bpfilter ceph fscache netfs nbd rbd libceph cls_bpf sch_ingress ip_set xt_CT xt_mark algif_hash af_alg veth xfrm_user xfrm_algo xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_filter ip6table_raw ip6table_mangle ip6_tables iptable_filter iptable_raw iptable_mangle iptable_nat xt_MASQUERADE nft_chain_nat nf_nat xt_addrtype xt_comment br_netfilter bridge stp llc mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_conntrack_netlink xt_tcpudp nft_counter xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink nvme_fabrics overlay mlx5_ib ib_uverbs ib_core mlx5_core mlxfw psample tls udf crc_itu_t sunrpc binfmt_misc nls_iso8859_1 joydev mac_hid hid_generic serio_raw crct10dif_pclmul hyperv_drm crc32_pclmul ghash_clmulni_intel drm_kms_helper cec sha256_ssse3 sha1_ssse3 rc_core aesni_intel fb_sys_fops crypto_simd syscopyarea cryptd sysfillrect sysimgblt
[727545.762976] hid_hyperv hid hv_netvsc hyperv_fb hyperv_keyboard dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore i2c_core ip_tables x_tables autofs4
[727545.895557] CR2: 0000000000000402
[727545.899282] ---[ end trace 0a4a4f3724a4a3e1 ]---
[727546.037476] RIP: 0010:netfs_rreq_unlock+0xf7/0x3c0 [netfs]
[727546.042395] Code: 7d a0 4c 89 fe 45 31 f6 e8 c6 cd 74 fa 4c 8b 45 90 48 85 c0 48 89 c7 0f 84 33 01 00 00 4d 8d 48 50 4c 89 fa 45 31 e4 4d 89 cf <48> 8b 0f 48 8b 47 20 48 2b 45 98 48 c1 e9 10 c1 e0 0c 83 e1 01 80
[727546.058669] RSP: 0018:ffff9e498ef07978 EFLAGS: 00010246
[727546.065230] RAX: 0000000000000402 RBX: ffff91342192e9c0 RCX: 0000000000000000
[727546.075005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000402
[727546.082181] RBP: ffff9e498ef07a10 R08: ffff913351b58d80 R09: ffff913351b58dd0
[727546.089031] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[727546.095385] R13: 0000000000000000 R14: 0000000000000000 R15: ffff913351b58dd0
[727546.101436] FS: 00007f2e0c518b28(0000) GS:ffff913a9fd80000(0000) knlGS:0000000000000000
[727546.108145] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[727546.116117] CR2: 0000000000000402 CR3: 000000012a63e004 CR4: 0000000000370ee0
[727546.122130] Kernel panic - not syncing: Fatal exception
[727546.148639] Kernel Offset: 0x39e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Although CVE-2023-52582 was a bug in netfs, it's not related to this issue.
Added head of patch discussion in kernel mailing list to additional context. netfs: Fix missing xas_retry() calls in xarray iteration
Action required from @aritraghosh, @julia-yin, @AllenWen-at-Azure
Issue is currently being tracked with Azure Containers and Azure Linux teams, TrackingID#2501170010002053
Action required from @aritraghosh, @julia-yin, @AllenWen-at-Azure
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Canonical merged xarray retry mechanism which released in Azure Linux kernel version 5.15.0-1083 available for AKS with the node image version 202503.21.0 or greater