libs icon indicating copy to clipboard operation
libs copied to clipboard

sinsp-example on ppc64le

Open sumitd2 opened this issue 3 years ago • 14 comments

Dear Support,

sinsp-example reports two events for a curl command on x86_64, but only one on ppc64le. libs version is master. Its causing failing tests in a package we are trying to port to ppc64le. Are we missing something obvious?

x86_64:

[root@x006vm55 build]# BPF_PROBE=driver/bpf/probe.o ./libsinsp/examples/sinsp-example -f "evt.category=process and evt.type=execve"
[2022-05-25T16:47:41.880673415+0000]:[HOST]:[CAT=PROCESS]:[PPID=126412]:[PID=202674]:[TYPE=execve]:[EXE=/usr/bin/bash]:[CMD=bash]
[2022-05-25T16:47:41.881385618+0000]:[HOST]:[CAT=PROCESS]:[PPID=126412]:[PID=202674]:[TYPE=execve]:[EXE=/usr/bin/curl]:[CMD=curl www.google.com]

ppc64le:

[root@acs-upstream-node-ghatwala build]# BPF_PROBE=driver/bpf/probe.o ./libsinsp/examples/sinsp-example -f "evt.category=process and evt.type=execve"
[2022-05-25T16:54:13.607406260+0000]:[HOST]:[CAT=PROCESS]:[PPID=1191499]:[PID=1296679]:[TYPE=execve]:[EXE=/usr/bin/bash]:[CMD=bash]

sumitd2 avatar May 25 '22 16:05 sumitd2

Hi! Thanks for opening this issue! Unfortunately this is a well known bug: basically on non x86_64 architectures, execve exit tracepoint is not called by the kernel, therefore we miss the exit call. Another thing to notice is that exit tracepoint is instead called if execve call fails. Moreover, same issue happens with clone exit for the child, for the very same reason: it seems that on these architectures, when a new kernel task is created, it "forgets" that it is exiting from a syscall and the tracepoint is not called.

Note that only x86_64 is officially supported as of now, but arm64 is on its way and we are looking for ways to fix this exact issue :) I will keep you updated! Thanks!

FedeDP avatar May 25 '22 19:05 FedeDP

Hi, thank you for the quick reply. We also noticed that none of these socket events are coming on ppc64le:

x86_64:

[2022-05-26T07:33:06.708142121+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=socket]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708172825+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=socket]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708187985+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=setsocktopt]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708192450+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=setsocktopt]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708209044+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=setsocktopt]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708212020+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=setsocktopt]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708214495+0000]:[HOST]:[CAT=NET]:[PPID=210861]:[PID=210984]:[TYPE=bind]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708358296+0000]:[HOST]:[CAT=NET]:[0x2e91110]:[PPID=210861]:[PID=210984]:[TYPE=bind]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708375997+0000]:[HOST]:[CAT=NET]:[0x2e91110]:[PPID=210861]:[PID=210984]:[TYPE=listen]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]
[2022-05-26T07:33:06.708385307+0000]:[HOST]:[CAT=NET]:[0x2e91110]:[PPID=210861]:[PID=210984]:[TYPE=listen]:[EXE=/usr/bin/iperf3]:[CMD=iperf3 -s]

sumitd2 avatar May 26 '22 07:05 sumitd2

Thank you @sumitd2, we will take a look at it. As @FedeDP said right now only x86_64 is officially supported, but we will try to fix this kind of issue ASAP :)

Andreagit97 avatar May 26 '22 10:05 Andreagit97

Hi team,

  1. I debugged the missing socket events issue (as in my second comment above), and found the problem to be here: https://github.com/falcosecurity/libs/blob/39ae7d40496793cf3d3e7890c9bbdc202263836b/driver/syscall_table.c#L197 I changed this line to #if 1, and the socket events are now being reported correctly by sinsp-example. Not sure what the proper fix is, can't raise a PR yet, so reporting it to you. Also,

  2. The execve event: [2022-05-25T16:54:13.607406260+0000]:[HOST]:[CAT=PROCESS]:[PPID=1191499]:[PID=1296679]:[TYPE=execve]:[EXE=/usr/bin/bash]:[CMD=bash] is being reported correctly on ppc64le, even if it is a well known bug according to you guys. Can I ask you how?

Thanks.

sumitd2 avatar Jun 01 '22 15:06 sumitd2

Hi @sumitd2 !

The execve event: [2022-05-25T16:54:13.607406260+0000]:[HOST]:[CAT=PROCESS]:[PPID=1191499]:[PID=1296679]:[TYPE=execve]:[EXE=/usr/bin/bash]:[CMD=bash] is being reported correctly on ppc64le, even if it is a well known bug according to you guys. Can I ask you how?

Yep, but you are only getting notified about the enter event (the /usr/bin/bash one), not the exit event, ie: the execve'd process, like in your x86_64 output:

[root@x006vm55 build]# BPF_PROBE=driver/bpf/probe.o ./libsinsp/examples/sinsp-example -f "evt.category=process and evt.type=execve" [2022-05-25T16:47:41.880673415+0000]:[HOST]:[CAT=PROCESS]:[PPID=126412]:[PID=202674]:[TYPE=execve]:[EXE=/usr/bin/bash]:[CMD=bash] [2022-05-25T16:47:41.881385618+0000]:[HOST]:[CAT=PROCESS]:[PPID=126412]:[PID=202674]:[TYPE=execve]:[EXE=/usr/bin/curl]:[CMD=curl www.google.com]

I debugged the missing socket events issue (as in my second comment above), and found the problem to be here:

I don't really know the origin of that #ifdef really; we will look into it, thank you for spotting this!

FedeDP avatar Jun 01 '22 15:06 FedeDP

@FedeDP Got it. I was under the impression its only reported once by sinsp-example on the exit event. Thank you.

sumitd2 avatar Jun 01 '22 15:06 sumitd2

Hi @FedeDP @Andreagit97, @Molter73,

libs does not build on ppc64le anymore, and this has broken our application.

This commit (quirks.h) caused it: https://github.com/falcosecurity/libs/commit/0dde9981f776a8080a2e9d104d251b8a7b5edd5c

Can you please fix it?

sumitd2 avatar Jul 25 '22 09:07 sumitd2

Hi! Fact is, our drivers do not really support ppc64le unfortunately. What's the build error?

Note however, that even if we allow them to build on ppc64le, most probably subtle things won't work right!

FedeDP avatar Jul 25 '22 09:07 FedeDP

Yeah probably enabling BPF raw tracepoint when available breaks the compilation in not supported architectures, it would be great to have the build error just to be sure of it :thinking: probably some conditional compilation should be enough but let's see :eyes:

Andreagit97 avatar Jul 25 '22 09:07 Andreagit97

@FedeDP @Andreagit97 All our tests have been passing, so I assumed that was enough. I do see your point. Mauro (@Molter73) should be able to throw more light to it as he maintains the application.

sumitd2 avatar Jul 25 '22 09:07 sumitd2

Hi @sumitd2, can you share the build error here? Our x86 downstream application is compiling correctly.

Molter73 avatar Jul 25 '22 10:07 Molter73

In file included from /root/sumit/libs/driver/bpf/probe.c:20:
/root/sumit/libs/driver/bpf/plumbing_helpers.h:107:19: error: no member named 'orig_ax' in 'struct pt_regs'
        id = _READ(regs->orig_ax);
                   ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:107:19: error: no member named 'orig_ax' in 'struct pt_regs'
        id = _READ(regs->orig_ax);
                   ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:21:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:107:5: error: assigning to 'long' from incompatible type 'void'
        id = _READ(regs->orig_ax);
           ^ ~~~~~~~~~~~~~~~~~~~~
/root/sumit/libs/driver/bpf/plumbing_helpers.h:160:21: error: no member named 'di' in 'struct pt_regs'
                arg = _READ(regs->di);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:160:21: error: no member named 'di' in 'struct pt_regs'
                arg = _READ(regs->di);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:21:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:160:7: error: assigning to 'unsigned long' from incompatible type 'void'
                arg = _READ(regs->di);
                    ^ ~~~~~~~~~~~~~~~
/root/sumit/libs/driver/bpf/plumbing_helpers.h:163:21: error: no member named 'si' in 'struct pt_regs'
                arg = _READ(regs->si);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:163:21: error: no member named 'si' in 'struct pt_regs'
                arg = _READ(regs->si);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:21:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:163:7: error: assigning to 'unsigned long' from incompatible type 'void'
                arg = _READ(regs->si);
                    ^ ~~~~~~~~~~~~~~~
/root/sumit/libs/driver/bpf/plumbing_helpers.h:166:21: error: no member named 'dx' in 'struct pt_regs'
                arg = _READ(regs->dx);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:166:21: error: no member named 'dx' in 'struct pt_regs'
                arg = _READ(regs->dx);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:21:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:166:7: error: assigning to 'unsigned long' from incompatible type 'void'
                arg = _READ(regs->dx);
                    ^ ~~~~~~~~~~~~~~~
/root/sumit/libs/driver/bpf/plumbing_helpers.h:169:21: error: no member named 'r10' in 'struct pt_regs'
                arg = _READ(regs->r10);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:169:21: error: no member named 'r10' in 'struct pt_regs'
                arg = _READ(regs->r10);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:21:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:169:7: error: assigning to 'unsigned long' from incompatible type 'void'
                arg = _READ(regs->r10);
                    ^ ~~~~~~~~~~~~~~~~
/root/sumit/libs/driver/bpf/plumbing_helpers.h:172:21: error: no member named 'r8' in 'struct pt_regs'
                arg = _READ(regs->r8);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:172:21: error: no member named 'r8' in 'struct pt_regs'
                arg = _READ(regs->r8);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:21:44: note: expanded from macro '_READ'
                    bpf_probe_read(&_val, sizeof(_val), &P);    \
                                                         ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:172:7: error: assigning to 'unsigned long' from incompatible type 'void'
                arg = _READ(regs->r8);
                    ^ ~~~~~~~~~~~~~~~
/root/sumit/libs/driver/bpf/plumbing_helpers.h:175:21: error: no member named 'r9' in 'struct pt_regs'
                arg = _READ(regs->r9);
                            ~~~~  ^
/root/sumit/libs/driver/bpf/plumbing_helpers.h:19:28: note: expanded from macro '_READ'
#define _READ(P) ({ typeof(P) _val;                             \
                           ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]

sumitd2 avatar Jul 25 '22 10:07 sumitd2

@sumitd2 yes raw_tracepoint causes these problems on unsupported architectures, hoping the conditional compilation will be enough, I will try to fix it ASAP :)

Andreagit97 avatar Jul 25 '22 10:07 Andreagit97

Ei @sumitd2 this one should solve your problem https://github.com/falcosecurity/libs/pull/505

Andreagit97 avatar Jul 26 '22 16:07 Andreagit97

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Oct 24 '22 21:10 poiana

/remove-lifecycle stale

FedeDP avatar Oct 25 '22 06:10 FedeDP

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Jan 23 '23 09:01 poiana

/remove-lifecycle stale

jasondellaluce avatar Jan 23 '23 12:01 jasondellaluce

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Apr 23 '23 13:04 poiana

/remove-lifecycle stale

Andreagit97 avatar Apr 24 '23 09:04 Andreagit97

@sumitd2 can we close this?

Andreagit97 avatar Jun 07 '23 12:06 Andreagit97

Hi @Andreagit97, FYI this bug has been fixed in the upstream Linux ppc64le and RH kernels. Please close this.

sumitd2 avatar Jun 13 '23 05:06 sumitd2

/close

jasondellaluce avatar Jun 13 '23 07:06 jasondellaluce

@jasondellaluce: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

poiana avatar Jun 13 '23 07:06 poiana