hooked_syscall event does not work correctly in v6.5 kernels (6.5.0-9-generic)
Description
I'm currently commenting v6.5 tests out for hooked_syscall because it looks like they're buggy. Make sure to enable re-enable the test in the file: tests/e2e-inst-test.sh once there is a fix for this issue (same PR).
Expected output:
{"timestamp":1698505727456484223,"threadStartTime":1698505724660132812,"processorId":0,"processId":2056752,"cgroupId":4846,"threadId":2056758,"parentProcessId":2056751,"hostProcessId":2056752,"hostThreadId":2056758,"hostParentProcessId":2056751,"userId":0,"mountNamespace":4026531841,"pidNamespace":4026531836,"processName":"tracee","executable":{"path":""},"hostName":"rugged","containerId":"","container":{},"kubernetes":{},"eventId":"2017","eventName":"hooked_syscall","matchedPolicies":[""],"argsNum":4,"returnValue":0,"syscall":"","stackAddresses":[0],"contextFlags":{"containerStarted":false,"isCompat":false},"threadEntityId":468960211,"processEntityId":2773370071,"parentEntityId":3105609655,"args":[{"name":"syscall","type":"const char*","value":"uname"},{"name":"address","type":"const char*","value":"ffffffffc1758000"},{"name":"function","type":"const char*","value":"hooked_uname"},{"name":"owner","type":"const char*","value":"hijack"}]}
Current output:
{"timestamp":1698505538881857437,"threadStartTime":1698505531788318198,"processorId":3,"processId":10538,"cgroupId":6109,"threadId":10545,"parentProcessId":10537,"hostProcessId":10538,"hostThreadId":10545,"hostParentProcessId":10537,"userId":0,"mountNamespace":4026531841,"pidNamespace":4026531836,"processName":"tracee","executable":{"path":""},"hostName":"ip-172-31-22-41","containerId":"","container":{},"kubernetes":{},"eventId":"2017","eventName":"hooked_syscall","matchedPolicies":[""],"argsNum":4,"returnValue":0,"syscall":"","stackAddresses":[0],"contextFlags":{"containerStarted":false,"isCompat":false},"threadEntityId":2064190888,"processEntityId":1319396407,"parentEntityId":152744913,"args":[{"name":"syscall","type":"const char*","value":"uname"},{"name":"address","type":"const char*","value":"ffffffffc0821010"},{"name":"function","type":"const char*","value":"tls_register"},{"name":"owner","type":"const char*","value":"tls"}]}
You can see that "function" and "owner" are wrong. And the values aren't constant:
{"timestamp":1698506271499896575,"threadStartTime":1698506267506919530,"processorId":1,"processId":973,"cgroupId":5717,"threadId":985,"parentProcessId":972,"hostProcessId":973,"hostThreadId":985,"hostParentProcessId":972,"userId":0,"mountNamespace":4026531841,"pidNamespace":4026531836,"processName":"tracee","executable":{"path":""},"hostName":"ip-172-31-22-41","containerId":"","container":{},"kubernetes":{},"eventId":"2017","eventName":"hooked_syscall","matchedPolicies":[""],"argsNum":4,"returnValue":0,"syscall":"","stackAddresses":[0],"contextFlags":{"containerStarted":false,"isCompat":false},"threadEntityId":1335533706,"processEntityId":4211216272,"parentEntityId":4154948111,"args":[{"name":"syscall","type":"const char*","value":"uname"},{"name":"address","type":"const char*","value":"ffffffffc09bb010"},{"name":"function","type":"const char*","value":"nf_conntrack_standalone_init"},{"name":"owner","type":"const char*","value":"nf_conntrack"}]}
It becomes "nf_conntrack" in another run.
Output of tracee version:
close to v0.19 release
Output of uname -a:
Linux ip-172-31-22-41 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 7 01:35:40 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Additional details
This kernel version is currently giving me:
[ 7.757620] ================================================================================
[ 7.760841] UBSAN: array-index-out-of-bounds in /build/linux-D15vQj/linux-6.5.0/drivers/net/xen-netfront.c:1291:3
[ 7.765056] index 1 is out of range for type 'xen_netif_rx_sring_entry [1]'
[ 7.767694] CPU: 2 PID: 328 Comm: ip Not tainted 6.5.0-9-generic #9-Ubuntu
[ 7.767698] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
[ 7.767700] Call Trace:
[ 7.767702] <IRQ>
[ 7.767706] dump_stack_lvl+0x48/0x70
[ 7.767714] dump_stack+0x10/0x20
[ 7.767718] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 7.767723] xennet_poll+0x930/0x960
[ 7.767729] __napi_poll+0x33/0x1f0
[ 7.767735] net_rx_action+0x181/0x2e0
[ 7.767738] ? evtchn_2l_handle_events+0x160/0x420
[ 7.767744] __do_softirq+0xd9/0x346
[ 7.767750] do_softirq.part.0+0x41/0x80
[ 7.767755] </IRQ>
[ 7.767757] <TASK>
[ 7.767758] __local_bh_enable_ip+0x72/0x80
[ 7.767762] _raw_spin_unlock_bh+0x1d/0x30
[ 7.767765] xennet_open+0x75/0x160
[ 7.767767] __dev_open+0x105/0x1d0
[ 7.767772] __dev_change_flags+0x1b5/0x230
[ 7.767776] dev_change_flags+0x27/0x80
[ 7.767779] do_setlink+0x3c8/0xe60
[ 7.767785] ? __nla_validate_parse+0x49/0x1e0
[ 7.767791] __rtnl_newlink+0x6db/0x760
[ 7.767796] rtnl_newlink+0x48/0x80
[ 7.767799] rtnetlink_rcv_msg+0x170/0x430
[ 7.767803] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 7.767806] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 7.767809] netlink_rcv_skb+0x5d/0x110
[ 7.767815] rtnetlink_rcv+0x15/0x30
[ 7.767818] netlink_unicast+0x247/0x390
[ 7.767821] netlink_sendmsg+0x25e/0x4e0
[ 7.767825] sock_sendmsg+0xcc/0xd0
[ 7.767830] ____sys_sendmsg+0x2a7/0x370
[ 7.767834] ___sys_sendmsg+0x9a/0xf0
[ 7.767838] __sys_sendmsg+0x89/0xf0
[ 7.767841] __x64_sys_sendmsg+0x1d/0x30
[ 7.767844] do_syscall_64+0x5c/0x90
[ 7.767848] ? exit_to_user_mode_prepare+0x30/0xb0
[ 7.767854] ? irqentry_exit_to_user_mode+0x17/0x20
[ 7.767859] ? irqentry_exit+0x43/0x50
[ 7.767862] ? exc_page_fault+0x94/0x1b0
[ 7.767865] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 7.767869] RIP: 0033:0x7f7326d2a814
[ 7.767877] Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 35 bd 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
[ 7.767880] RSP: 002b:00007ffc8a2071e8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
[ 7.767883] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7326d2a814
[ 7.767885] RDX: 0000000000000000 RSI: 00007ffc8a207250 RDI: 0000000000000003
[ 7.767887] RBP: 0000000000000003 R08: 00000000653d1c88 R09: 0000000000000050
[ 7.767888] R10: 0000000000000000 R11: 0000000000000202 R12: 00000000653d1c89
[ 7.767890] R13: 000055e82755f040 R14: 0000000000000001 R15: 0000000000000000
[ 7.767894] </TASK>
[ 7.767895] ================================================================================
I tried updating it but kept receiving the same UBSAN message. Either way, its related to an out-of-index address for a netlink buffer used to communicate with Xen (in a quick look from the stack trace). I don't think it would matter for the event logic.
About adding tests in open-source: https://github.com/aquasecurity/tracee/issues/3602#issuecomment-1783849922 for all hook events.
@OriGlassman are you working on this?
@OriGlassman are you working on this?
This is not a bug in the event per se. It's caused by different symbols that sits in the same address in kallsyms. From my short investigation, I saw that the conflicting symbol was the init function of some kernel module, which should have been freed long time ago. My assumption is that it's cleaned memory-wise, but not kallsyms-wise, then the memory is reused and there're 2 different symbols in kallsyms sitting in the same address. So this sounds like something that should be addressed in the libbpfgo kallsyms API.
My assumption is that it's cleaned memory-wise, but not kallsyms-wise, then the memory is reused and there're 2 different symbols in kallsyms sitting in the same address.
You might be describing something I described at:
https://github.com/aquasecurity/tracee/issues/3653
Specifically at https://github.com/aquasecurity/tracee/issues/3653#issuecomment-1789664802
@OriGlassman we recently bumped libbpfgo version, do you know what is the status of this?
hooked_syscall event now works on kernels > 6.5 and issue seems to be resolved.