libs
libs copied to clipboard
update(modern_bpf): reduce the `execve` instrumentation time with new APIs
What type of PR is this?
/kind cleanup
/kind feature
Any specific area of the project related to this PR?
/area driver-modern-bpf
Does this PR require a change in the driver versions?
No
What this PR does / why we need it:
Playing with https://github.com/Andreagit97/BPF-perftool we noticed that the execveat instrumentation time was strangely high compared to the old probe one:
old probe new probe
12140 ns 16347 ns
This overhead seems related to the use of extract__charbuf_pointer_from_array inline function, for some reason clang increases the complexity of the programs causing overhead in the prog execution. With the new APIs auxmap__store_execve_exe and auxmap__store_execve_args I've recorded these new data:
old probe new probe
12140 ns 10518 ns
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
NONE
LGTM label has been added.
@Andreagit97 , very nice tooling for measurements. I will have a look at it. I left a comment on #633 regarding the args/env max discussion. So far, you change here looks nice.
Thanks to @incertum fixes I've added also 2 tests for execve and execveat success cases to avoid future issues like the one reported here https://github.com/falcosecurity/libs/pull/633
@hbrueckner could you try the last commit? I've seen strange behaviors... on kernel 5.4 clone child exit events are not generated while on recent kernel they seem to be generated :thinking: you could see it with bpftrace
sudo bpftrace -e 'tracepoint:syscalls:sys_exit_clone { printf("clone: %ld\n", args->ret); }'
if there are prints like clone: 0 it means that the sys_exit_clone tracepoint is working correctly on your machine and so the tests should pass on your machine
Hi @Andreagit97
wow.... I briefly looked into it but need a more closer look later on:
@hbrueckner could you try the last commit? I've seen strange behaviors... on kernel 5.4 clone child exit events are not generated while on recent kernel they seem to be generated thinking you could see it with
bpftracesudo bpftrace -e 'tracepoint:syscalls:sys_exit_clone { printf("clone: %ld\n", args->ret); }'
if there are prints like
clone: 0it means that thesys_exit_clonetracepoint is working correctly on your machine and so the tests should pass on your machine
This actually works on my machine and also tried on a RHEL 8.6 where it works as well.
The strange thing is that with you changes, I receive test case failures for fork, clone, and clone3:
[ RUN ] SyscallExit.clone3X_child
/root/git/falcosecurity/libs/test/modern_bpf/event_class/event_class.cpp:746: Failure
Expected equality of these values:
size
Which is: 0
expected_size
Which is: 27
>>>>> length of the param is not correct. Param id = 2
/root/git/falcosecurity/libs/test/modern_bpf/event_class/event_class.cpp:485: Failure
Expected equality of these values:
m_event_params[m_current_param].valptr
Which is: ""
param
Which is: "./test/modern_bpf/bpf_test"
>>>>> value of the param is not correct. Param id = 2
/root/git/falcosecurity/libs/test/modern_bpf/event_class/event_class.cpp:449: Failure
Expected equality of these values:
*(T*)(m_event_params[m_current_param].valptr)
Which is: 0
param
Which is: 1
>>>>> value of the param is not correct. Param id = 16
/root/git/falcosecurity/libs/test/modern_bpf/test_suites/syscall_exit_suite/clone3_x.cpp:277: Failure
Failed
Something in the child failed.
The *(T*)(m_event_params[m_current_param].valptr) test is specific to clone3X. But if I understand it correctly, the child event is catched but the exe is not correctly set... hmm... which should be the case when the returned pid is negative? :thinking:
@hbrueckner yeah there is something strange the child event is here but the arguments no :thinking:
------------------ EVENT: 335 TID:39942 <--- father
------ HEADER
timestamp: 1666610575449799973
tid: 39942
len: 323
type: 335
num params: 20
------
------ PARAMS
PARAM 0: 39943
PARAM 1: ./test/modern_bpf/bpf_test
PARAM 2: --gtest_filter=SyscallExit.clone3X_child*
PARAM 3: 39942
PARAM 4: 39942
PARAM 5: 39941
PARAM 6: is empty
PARAM 7: 1024
PARAM 8: 12
PARAM 9: 2046
PARAM 10: 11504
PARAM 11: 7868
PARAM 12: 0
PARAM 13: bpf_test
PARAM 14: cpuset=/cpu=/cpuacct=/io=/memory=/user.slice/user-1000.slice/session-156.scope
PARAM 15: 1
PARAM 16: 0
PARAM 17: 0
PARAM 18: 39942
PARAM 19: 39942
------
------------------
------------------ EVENT: 335 TID:39943 <--- child
------ HEADER
timestamp: 1666610575449801802
tid: 39943
len: 254
type: 335
num params: 20
------
------ PARAMS
PARAM 0: 0
PARAM 1: is empty
PARAM 2: is empty
PARAM 3: 39943
PARAM 4: 39943
PARAM 5: 39942
PARAM 6: is empty
PARAM 7: 1024
PARAM 8: 0
PARAM 9: 0
PARAM 10: 11504
PARAM 11: 3760
PARAM 12: 0
PARAM 13: bpf_test
PARAM 14: cpuset=/cpu=/cpuacct=/io=/memory=/user.slice/user-1000.slice/session-156.scope
PARAM 15: 0
PARAM 16: 0
PARAM 17: 0
PARAM 18: 39943
PARAM 19: 39943
------
------------------
it's seems unable to read from syscall arguments in the child event :thinking:
@Andreagit97 ,
it's seems unable to read from syscall arguments in the child event thinking
yes, looks the bpf_probe_read_user to the arg_start_pointer fails:
bpf_test-82749 [000] d.... 1539123.252912: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
bpf_test-82749 [000] d.... 1539123.252915: bpf_trace_printk: exe_len=27
bpf_test-82750 [001] d.... 1539123.252917: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
bpf_test-82750 [001] d.... 1539123.252918: bpf_trace_printk: exe_len=0
Hi @Andreagit97
bpf_test-82749 [000] d.... 1539123.252912: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68 bpf_test-82749 [000] d.... 1539123.252915: bpf_trace_printk: exe_len=27 bpf_test-82750 [001] d.... 1539123.252917: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68 bpf_test-82750 [001] d.... 1539123.252918: bpf_trace_printk: exe_len=0
We get an EFAULT:
bpf_test-85498 [000] d.... 1542076.393006: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68
bpf_test-85499 [001] d.... 1542076.393009: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68
bpf_test-85498 [000] d.... 1542076.393009: bpf_trace_printk: exe_len=27
bpf_test-85499 [001] d.... 1542076.393010: bpf_trace_printk: exe_len=0
bpf_test-85499 [001] d.... 1542076.393011: bpf_trace_printk: mem=0 probe_read failed: -14
Hi @Andreagit97
bpf_test-82749 [000] d.... 1539123.252912: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68 bpf_test-82749 [000] d.... 1539123.252915: bpf_trace_printk: exe_len=27 bpf_test-82750 [001] d.... 1539123.252917: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68 bpf_test-82750 [001] d.... 1539123.252918: bpf_trace_printk: exe_len=0We get an
EFAULT:bpf_test-85498 [000] d.... 1542076.393006: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68 bpf_test-85499 [001] d.... 1542076.393009: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68 bpf_test-85498 [000] d.... 1542076.393009: bpf_trace_printk: exe_len=27 bpf_test-85499 [001] d.... 1542076.393010: bpf_trace_printk: exe_len=0 bpf_test-85499 [001] d.... 1542076.393011: bpf_trace_printk: mem=0 probe_read failed: -14
It sounds like these arguments are no more in RAM when the child starts this is really strange since on x86 it works smoothly:thinking:
Hi @Andreagit97
these arguments are no more in RAM when the child starts this is really strange since on x86 it works smoothly
The arg_start and arg_end addresses are the same for parent and child. So my assumption is that the child process does not yet have its own copy of those (user space) pages. Yet to be confirmed.
hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same :thinking:
Hi @Andreagit97
hey @hbrueckner the problem is even worst, the
kmodis not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same thinking
This indeed looks very strange. Also had a look add kernel driver and kernel sources, and agree they basically use the same tracepoint_probe_register() functions. Regarding the modern bpf and related EFAULT, one option would be to try a sleepable BPF program for clone family and use bpf_copy_from_user() which might sleep (not sure if that will work or perform at all). However, I did not catch the right place to add the BPF_F_SLEEPABLE program attribute.
Hi @Andreagit97,
thanks for the update and, yeah, also agree on sticking with sched_exec_fork for now until we have more clarity on the strange behavior. Perhaps, worth to open a separate issue to track it? wdyt?
Hi @Andreagit97
hey @hbrueckner the problem is even worst, the
kmodis not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same thinkingThis indeed looks very strange. Also had a look add kernel driver and kernel sources, and agree they basically use the same
tracepoint_probe_register()functions. Regarding the modern bpf and relatedEFAULT, one option would be to try a sleepable BPF program forclonefamily and usebpf_copy_from_user()which might sleep (not sure if that will work or perform at all). However, I did not catch the right place to add theBPF_F_SLEEPABLEprogram attribute.
I've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner
Sorry for the PR dimension this is quite huge but there are a lot of tests inside... I still need to address this comment https://github.com/falcosecurity/libs/pull/648#discussion_r1003147115
Hi @Andreagit97,
thanks for the update and, yeah, also agree on sticking with sched_exec_fork for now until we have more clarity on the strange behavior. Perhaps, worth to open a separate issue to track it? wdyt?
Agree on opening the issue, the best solution would be to find another way to take the args from the kernel but i would address it in another PR probably as you suggested :)
've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner
Fully agree with the solution; simple and effective!
the good news is that this PR should be the last one to have a full working implementation of the modern bpf probe :partying_face:
've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner
Fully agree with the solution; simple and effective!
Agreed too! Thanks a lot @Andreagit97 !
Hi @Andreagit97
Sorry for the PR dimension this is quite huge but there are a lot of tests inside... I still need to address this comment #648 (comment)
No problem ... I will need some time for a deeper review on the recent changes ... and, awesome that we now have a full working modern bpf probe!
Tested on all three architectures (x86, aarch64, s390x) and it should work!
Hi @Andreagit97 ,
Tested on all three architectures (x86, aarch64, s390x) and it should work!
Confirmed for s390x:
[----------] Global test environment tear-down
[==========] 205 tests from 3 test suites ran. (131 ms total)
[ PASSED ] 205 tests.
Further, I went through your commits and they are fine!
@Andreagit97 Incredible and awesome work as usual!
/lgtm
LGTM label has been added.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Andreagit97, FedeDP, hbrueckner, leogr
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [Andreagit97,FedeDP,leogr]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
This PR completes https://github.com/falcosecurity/libs/issues/513
/milestone 4.0.0+driver