libs [FEATURE] Options available for tapping into "linux_binprm" that holds args used when loading binaries

Motivation

Quote from https://github.com/falcosecurity/libs/pull/595

Another kernel side signal that would like to look into and possibly add to this PR would be:

"Interpreter scripts" aka text files with execute permissions (see https://man7.org/linux/man-pages/man2/execve.2.html) For example chmod +x a.sh && ./a.sh or chmod +x a.sh && exec ./a.sh is currently logged as "proc.exepath":"/tmp/a.sh","proc.name":"a.sh","proc.cmdline":"a.sh ./a.sh", but the interpreter was configured as #! /bin/sh and we wouldn't know what interpreter binary ran the script directly or that it was not a binary without inferring from extension if even available and we know how fragile that is.

Please note, not talking about the use case where you run the interpreter and pass the script, like /bin/sh a.sh would give "proc.exepath":"/bin/sh","proc.name":"sh","proc.cmdline":"sh a.sh".

Any thoughts on above? @LucaGuerra @loresuso @FedeDP @Andreagit97

struct linux_binprm is readily available in the sched/sched_process_exec tracepoint, see https://github.com/falcosecurity/libs/blob/master/driver/bpf/types.h#L142 that got introduced by @Andreagit97 for ARM64 https://github.com/falcosecurity/libs/pull/416. struct linux_binprm holds args used when loading binaries https://github.com/torvalds/linux/blob/master/include/linux/binfmts.h#L49-L60.

Would it even be possible to access struct linux_binprm through the raw tracepoint? If so how? I see that mm_struct has struct linux_binfmt, but that's it. Hopefully I am just missing something and there is an easy solution.

If it is not possible to access it over thesys_exit raw tracepoint, could we have an open discussion around unifying PPME_SYSCALL_EXECVE_19_X and PPME_SYSCALL_EXECVEAT_X to using the sched/sched_process_exec tracepoint instead? Rating this in terms of security monitoring enhancement I would give it a 10 out of 10. While it would be a slight perf hit, there are noisier system calls comparatively and we kind of already have to do it that way for ARM64 anyways.

What other options would be available? Are there more alternatives?

Sep 20 '22 03:09 incertum

Related to thinking in https://github.com/falcosecurity/libs/issues/252 @LucaGuerra @loresuso.

Sep 20 '22 04:09 incertum

Hi @incertum, I think you raised a significant point here. I do not see any obvious way to retrieve struct linux_binprm from the sys_exit tracepoint, but I agree that the information contained in that struct could be relevant for security monitoring.

In the end, that struct is basically passed to all the LSM to perform their checks upon execution, so maybe this deserves further investigation. We could be able to easily retrieve also the full path of the executable without performing any path resolution (the kernel already did it for us), and just only for this point, it could be really valuable, other than also pointing out the interpreter in case we are executing a script. Let's see the opinion of the other folks too 🙂

Thank you for noticing this!

Sep 20 '22 07:09 loresuso

My 2 cents on this. Even if we are able to recover the struct linux_binprm from sys_exit it would be really a mess and really expensive while with sched/sched_process_exec we can obtain it from the registers, so yes I would use the sched/sched_process_exec here...

I think that we have 2 possible directions to follow:

disable the execve/execveat syscall flow and use sched/sched_process_exec to send execve/execveat exit events on all architectures (quick and dirty)
create a sort of new security events that could be generated by tracepoints like sched/sched_process_exec or some Kprobes to security hooks for example (clearer but more complex)

To be honest, here I would vote for the second choice because this would open a new world for Falco! We could trace almost whatever we want not only syscalls, the pain point is the design phase as always but I think that we can do that in some ways.

WDYT about that @FedeDP @gnosek @leogr?

Sep 20 '22 18:09 Andreagit97

Just thinking again about it... since we have the collision with this tracepoint already used in ARM, what about using a kprobe? Ok, kernel functions could change over time but we already have all the history from 4.14 to 6.0 so why not :thinking:?

Or maybe since in this case, we have a simple tracepoint to do that why don't we use a second BPF program attached to the same tracepoint :thinking:

Just put to the table some ideas here :)

Sep 21 '22 09:09 Andreagit97

To be honest, here I would vote for the second choice because this would open a new world for Falco! We could trace almost whatever we want not only syscalls, the pain point is the design phase as always but I think that we can do that in some ways.

This issue is also in some way related to this one https://github.com/falcosecurity/libs/issues/252, the second approach could allow us to support also kprobes in some security hooks

Sep 21 '22 10:09 Andreagit97

I can't decide easily. :thinking: I really believe we have to experiment a bit

Sep 21 '22 13:09 leogr

Would favor staying open minded and explore all options. Furthermore, shall we follow a data-driven approach? Meaning we measure perf overhead on actual production servers instead of making decisions based on reputation?

Furthermore, it seems like kprobes are needed to bridge various security monitoring gaps. On the other hand for the particular data field discussed here (the full path of the interpreter) we have that shortcut available as you confirmed @Andreagit97 and @loresuso also pointed out that we can fetch the executable filename right there and save a few lookup cycles. Would be curious if there is an actual noticeable CPU hit given execve* really doesn't happen that often when compared to what happens while a process is running ...

How could we best start experimenting?

@leogr in general it seems that now that we have done this great refactor of syscalls of interest and tracepoints of interest we could more easily expand on this configurability to basically support all options, but also give the option to tailor the cost of running the tool to the budget available.

Sep 22 '22 05:09 incertum

Would favor staying open minded and explore all options. Furthermore, shall we follow a data-driven approach? Meaning we measure perf overhead on actual production servers instead of making decisions based on reputation?

Super +1 on my side, testing it directly in real scenarios would be amazing!

How could we best start experimenting?

What about a kprobe here https://github.com/torvalds/linux/blob/a63f2e7cb1107ab124f80407e5eb8579c04eb7a9/fs/exec.c#L1715? Here you can find more info about this hook point https://github.com/torvalds/linux/blob/a63f2e7cb1107ab124f80407e5eb8579c04eb7a9/include/linux/lsm_hooks.h#L62. This should allow us to take all the information we want and could easily become a new security event generated by a kprobe :thinking:

The only thing that worries me is this statement, what about perf ?

This hook may be called multiple times during a single execve.

Sep 24 '22 13:09 Andreagit97

This is gonna be next early next year (LSM hooks experiments in modern_bpf) ...

Dec 20 '22 01:12 incertum

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Mar 20 '23 03:03 poiana

/remove-lifecycle stale

Mar 20 '23 07:03 incertum

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Dec 03 '23 09:12 poiana

/remove-lifecycle stale

Dec 04 '23 10:12 Andreagit97

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Mar 03 '24 15:03 poiana

/remove-lifecycle stale

Mar 04 '24 21:03 incertum

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Jun 02 '24 21:06 poiana

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Jul 02 '24 21:07 poiana

/remove-lifecycle stale /remove-lifecycle rotten

Jul 18 '24 14:07 leogr

libs libs copied to clipboard

[FEATURE] Options available for tapping into "linux_binprm" that holds args used when loading binaries

libs
libs copied to clipboard