libs icon indicating copy to clipboard operation
libs copied to clipboard

update(modern_bpf): reduce the `execve` instrumentation time with new APIs

Open Andreagit97 opened this issue 3 years ago • 5 comments

What type of PR is this?

/kind cleanup

/kind feature

Any specific area of the project related to this PR?

/area driver-modern-bpf

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

Playing with https://github.com/Andreagit97/BPF-perftool we noticed that the execveat instrumentation time was strangely high compared to the old probe one:

old probe     new probe

12140 ns      16347 ns

This overhead seems related to the use of extract__charbuf_pointer_from_array inline function, for some reason clang increases the complexity of the programs causing overhead in the prog execution. With the new APIs auxmap__store_execve_exe and auxmap__store_execve_args I've recorded these new data:

old probe     new probe

12140 ns      10518 ns

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Andreagit97 avatar Oct 02 '22 17:10 Andreagit97

LGTM label has been added.

Git tree hash: 6210ba75c3a3c29c0dbbf33c60a4e419f49a50ee

poiana avatar Oct 03 '22 06:10 poiana

@Andreagit97 , very nice tooling for measurements. I will have a look at it. I left a comment on #633 regarding the args/env max discussion. So far, you change here looks nice.

hbrueckner avatar Oct 04 '22 15:10 hbrueckner

Thanks to @incertum fixes I've added also 2 tests for execve and execveat success cases to avoid future issues like the one reported here https://github.com/falcosecurity/libs/pull/633

Andreagit97 avatar Oct 15 '22 14:10 Andreagit97

@hbrueckner could you try the last commit? I've seen strange behaviors... on kernel 5.4 clone child exit events are not generated while on recent kernel they seem to be generated :thinking: you could see it with bpftrace

sudo bpftrace -e 'tracepoint:syscalls:sys_exit_clone { printf("clone: %ld\n",  args->ret); }'

if there are prints like clone: 0 it means that the sys_exit_clone tracepoint is working correctly on your machine and so the tests should pass on your machine

Andreagit97 avatar Oct 24 '22 08:10 Andreagit97

Hi @Andreagit97

wow.... I briefly looked into it but need a more closer look later on:

@hbrueckner could you try the last commit? I've seen strange behaviors... on kernel 5.4 clone child exit events are not generated while on recent kernel they seem to be generated thinking you could see it with bpftrace

sudo bpftrace -e 'tracepoint:syscalls:sys_exit_clone { printf("clone: %ld\n",  args->ret); }'

if there are prints like clone: 0 it means that the sys_exit_clone tracepoint is working correctly on your machine and so the tests should pass on your machine

This actually works on my machine and also tried on a RHEL 8.6 where it works as well. The strange thing is that with you changes, I receive test case failures for fork, clone, and clone3:

[ RUN      ] SyscallExit.clone3X_child
/root/git/falcosecurity/libs/test/modern_bpf/event_class/event_class.cpp:746: Failure
Expected equality of these values:
  size
    Which is: 0
  expected_size
    Which is: 27
>>>>> length of the param is not correct. Param id = 2

/root/git/falcosecurity/libs/test/modern_bpf/event_class/event_class.cpp:485: Failure
Expected equality of these values:
  m_event_params[m_current_param].valptr
    Which is: ""
  param
    Which is: "./test/modern_bpf/bpf_test"
>>>>> value of the param is not correct. Param id = 2

/root/git/falcosecurity/libs/test/modern_bpf/event_class/event_class.cpp:449: Failure
Expected equality of these values:
  *(T*)(m_event_params[m_current_param].valptr)
    Which is: 0
  param
    Which is: 1
>>>>> value of the param is not correct. Param id = 16

/root/git/falcosecurity/libs/test/modern_bpf/test_suites/syscall_exit_suite/clone3_x.cpp:277: Failure
Failed
Something in the child failed.

The *(T*)(m_event_params[m_current_param].valptr) test is specific to clone3X. But if I understand it correctly, the child event is catched but the exe is not correctly set... hmm... which should be the case when the returned pid is negative? :thinking:

hbrueckner avatar Oct 24 '22 10:10 hbrueckner

@hbrueckner yeah there is something strange the child event is here but the arguments no :thinking:

------------------ EVENT: 335 TID:39942  <--- father
------ HEADER
timestamp: 1666610575449799973
tid: 39942
len: 323
type: 335
num params: 20
------
------ PARAMS
PARAM 0: 39943
PARAM 1: ./test/modern_bpf/bpf_test
PARAM 2: --gtest_filter=SyscallExit.clone3X_child*
PARAM 3: 39942
PARAM 4: 39942
PARAM 5: 39941
PARAM 6: is empty
PARAM 7: 1024
PARAM 8: 12
PARAM 9: 2046
PARAM 10: 11504
PARAM 11: 7868
PARAM 12: 0
PARAM 13: bpf_test
PARAM 14: cpuset=/cpu=/cpuacct=/io=/memory=/user.slice/user-1000.slice/session-156.scope
PARAM 15: 1
PARAM 16: 0
PARAM 17: 0
PARAM 18: 39942
PARAM 19: 39942
------
------------------

------------------ EVENT: 335 TID:39943 <--- child
------ HEADER
timestamp: 1666610575449801802
tid: 39943
len: 254
type: 335
num params: 20
------
------ PARAMS
PARAM 0: 0
PARAM 1: is empty
PARAM 2: is empty
PARAM 3: 39943
PARAM 4: 39943
PARAM 5: 39942
PARAM 6: is empty
PARAM 7: 1024
PARAM 8: 0
PARAM 9: 0
PARAM 10: 11504
PARAM 11: 3760
PARAM 12: 0
PARAM 13: bpf_test
PARAM 14: cpuset=/cpu=/cpuacct=/io=/memory=/user.slice/user-1000.slice/session-156.scope
PARAM 15: 0
PARAM 16: 0
PARAM 17: 0
PARAM 18: 39943
PARAM 19: 39943
------
------------------

it's seems unable to read from syscall arguments in the child event :thinking:

Andreagit97 avatar Oct 24 '22 11:10 Andreagit97

@Andreagit97 ,

it's seems unable to read from syscall arguments in the child event thinking

yes, looks the bpf_probe_read_user to the arg_start_pointer fails:

        bpf_test-82749   [000] d.... 1539123.252912: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
        bpf_test-82749   [000] d.... 1539123.252915: bpf_trace_printk: exe_len=27
        bpf_test-82750   [001] d.... 1539123.252917: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
        bpf_test-82750   [001] d.... 1539123.252918: bpf_trace_printk: exe_len=0

hbrueckner avatar Oct 24 '22 12:10 hbrueckner

Hi @Andreagit97

        bpf_test-82749   [000] d.... 1539123.252912: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
        bpf_test-82749   [000] d.... 1539123.252915: bpf_trace_printk: exe_len=27
        bpf_test-82750   [001] d.... 1539123.252917: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
        bpf_test-82750   [001] d.... 1539123.252918: bpf_trace_printk: exe_len=0

We get an EFAULT:

        bpf_test-85498   [000] d.... 1542076.393006: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68
        bpf_test-85499   [001] d.... 1542076.393009: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68
        bpf_test-85498   [000] d.... 1542076.393009: bpf_trace_printk: exe_len=27
        bpf_test-85499   [001] d.... 1542076.393010: bpf_trace_printk: exe_len=0
        bpf_test-85499   [001] d.... 1542076.393011: bpf_trace_printk: mem=0 probe_read failed: -14

hbrueckner avatar Oct 24 '22 13:10 hbrueckner

Hi @Andreagit97

        bpf_test-82749   [000] d.... 1539123.252912: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
        bpf_test-82749   [000] d.... 1539123.252915: bpf_trace_printk: exe_len=27
        bpf_test-82750   [001] d.... 1539123.252917: bpf_trace_printk: start: 3fffd9fa572 end: 3fffd9fa5b6 len=68
        bpf_test-82750   [001] d.... 1539123.252918: bpf_trace_printk: exe_len=0

We get an EFAULT:

        bpf_test-85498   [000] d.... 1542076.393006: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68
        bpf_test-85499   [001] d.... 1542076.393009: bpf_trace_printk: start: 3ffcd9fa572 end: 3ffcd9fa5b6 len=68
        bpf_test-85498   [000] d.... 1542076.393009: bpf_trace_printk: exe_len=27
        bpf_test-85499   [001] d.... 1542076.393010: bpf_trace_printk: exe_len=0
        bpf_test-85499   [001] d.... 1542076.393011: bpf_trace_printk: mem=0 probe_read failed: -14

It sounds like these arguments are no more in RAM when the child starts this is really strange since on x86 it works smoothly:thinking:

Andreagit97 avatar Oct 24 '22 13:10 Andreagit97

Hi @Andreagit97

these arguments are no more in RAM when the child starts this is really strange since on x86 it works smoothly

The arg_start and arg_end addresses are the same for parent and child. So my assumption is that the child process does not yet have its own copy of those (user space) pages. Yet to be confirmed.

hbrueckner avatar Oct 24 '22 13:10 hbrueckner

hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same :thinking:

Andreagit97 avatar Oct 24 '22 15:10 Andreagit97

Hi @Andreagit97

hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same thinking

This indeed looks very strange. Also had a look add kernel driver and kernel sources, and agree they basically use the same tracepoint_probe_register() functions. Regarding the modern bpf and related EFAULT, one option would be to try a sleepable BPF program for clone family and use bpf_copy_from_user() which might sleep (not sure if that will work or perform at all). However, I did not catch the right place to add the BPF_F_SLEEPABLE program attribute.

hbrueckner avatar Oct 25 '22 09:10 hbrueckner

Hi @Andreagit97,

thanks for the update and, yeah, also agree on sticking with sched_exec_fork for now until we have more clarity on the strange behavior. Perhaps, worth to open a separate issue to track it? wdyt?

hbrueckner avatar Oct 25 '22 09:10 hbrueckner

Hi @Andreagit97

hey @hbrueckner the problem is even worst, the kmod is not able to catch the clone child event while the modern bpf probe yes, but the tracepoint is exactly the same thinking

This indeed looks very strange. Also had a look add kernel driver and kernel sources, and agree they basically use the same tracepoint_probe_register() functions. Regarding the modern bpf and related EFAULT, one option would be to try a sleepable BPF program for clone family and use bpf_copy_from_user() which might sleep (not sure if that will work or perform at all). However, I did not catch the right place to add the BPF_F_SLEEPABLE program attribute.

I've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner

Sorry for the PR dimension this is quite huge but there are a lot of tests inside... I still need to address this comment https://github.com/falcosecurity/libs/pull/648#discussion_r1003147115

Hi @Andreagit97,

thanks for the update and, yeah, also agree on sticking with sched_exec_fork for now until we have more clarity on the strange behavior. Perhaps, worth to open a separate issue to track it? wdyt?

Agree on opening the issue, the best solution would be to find another way to take the args from the kernel but i would address it in another PR probably as you suggested :)

Andreagit97 avatar Oct 25 '22 09:10 Andreagit97

've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner

Fully agree with the solution; simple and effective!

FedeDP avatar Oct 25 '22 09:10 FedeDP

the good news is that this PR should be the last one to have a full working implementation of the modern bpf probe :partying_face:

Andreagit97 avatar Oct 25 '22 09:10 Andreagit97

've just pushed a possible solution. In ARM we already need to catch the clone/clone3 child events from another tracepoint (sched_process_fork) since there is an issue in the sys_exit_tracepoint, in the past, I've seen this issue also in old s390x machines for this reason we have this workaround also in our kmod that supports s390x. So I propose to use also for s390x this different tracepoint since we already need it for arm64. Now all the tests should pass and we should be able to collect all fork/clone/clone3/vfork events without EFAULT problems since the child event is generated by the parent process. Let me know if you have some doubts or other suggestions @FedeDP @hbrueckner

Fully agree with the solution; simple and effective!

Agreed too! Thanks a lot @Andreagit97 !

hbrueckner avatar Oct 25 '22 09:10 hbrueckner

Hi @Andreagit97

Sorry for the PR dimension this is quite huge but there are a lot of tests inside... I still need to address this comment #648 (comment)

No problem ... I will need some time for a deeper review on the recent changes ... and, awesome that we now have a full working modern bpf probe!

hbrueckner avatar Oct 25 '22 09:10 hbrueckner

Tested on all three architectures (x86, aarch64, s390x) and it should work!

Andreagit97 avatar Oct 25 '22 14:10 Andreagit97

Hi @Andreagit97 ,

Tested on all three architectures (x86, aarch64, s390x) and it should work!

Confirmed for s390x:

[----------] Global test environment tear-down
[==========] 205 tests from 3 test suites ran. (131 ms total)
[  PASSED  ] 205 tests.

Further, I went through your commits and they are fine!

@Andreagit97 Incredible and awesome work as usual!

/lgtm

hbrueckner avatar Oct 25 '22 15:10 hbrueckner

LGTM label has been added.

Git tree hash: 37fc8e72aaa5b99bbcef12263541fdaf6a073d6b

poiana avatar Oct 25 '22 15:10 poiana

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, hbrueckner, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [Andreagit97,FedeDP,leogr]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

poiana avatar Oct 25 '22 15:10 poiana

This PR completes https://github.com/falcosecurity/libs/issues/513

Andreagit97 avatar Oct 26 '22 15:10 Andreagit97

/milestone 4.0.0+driver

FedeDP avatar Dec 02 '22 13:12 FedeDP