libs
libs copied to clipboard
[Tracking] Params inconsistencies in our drivers
PLEASE NOTE
This issue is mainly for tracking purposes, some points cannot be addressed until we solve the scap-file compatibility issue -> https://github.com/falcosecurity/libs/pull/1381#issuecomment-1746613905
Generic context
The aim of this issue is to track all the inconsistencies when we send event params from our drivers (modern_bpf, bpf, kernel module) to userspace. Some widespread issues that need a dedicated conversion
- Today when we send file descriptors
fd
to userspace, we send them asint64_t
while they are represented onint32_t
. This leads us to waste a lot of space in our ring buffers...fd
params are very common in our event, we waste4
bytes every time we send a param of this type. Considering a small/medium-size system, we can imagine that it could send also1
million offd
params per second, this would mean wasting almost4
MB of space in our ring buffers per second! - Today when we send process identifiers
pid
to userspace, we send them asint64_t
while they are represented onint32_t
. This leads us to waste a lot of space in our ring buffers as explained in the previousfd
case. - In some syscalls we take the syscalls flags as an
int
value and we push it to userspace asuint32_t
without converting it with our internalPPM
representation. - This is similar to the previous point, but even if we send
flags
/modes
with the same type (so takeuint32_t
and senduint32_t
) in some cases we don't convert these values into the scapPPM
format, so we cannot use thisflags
/modes
userspace-side even if we catch them driver side. - In some events we send empty params because they are still not implemented.
- Every syscall must have its
PPM_CODE
and its event pair. - Different drivers manage max boundaries in different ways we need to uniform them in some way :point_down: https://github.com/falcosecurity/libs/pull/648#discussion_r996388593
Syscall-specific issues
LEGENDA
-
[NOT ADDRESSABLE] -> means that the issue is not addressable at the moment, at least until we don't solve the scap file issue, see
PLEASE NOTE
:point_up: -
[MODERN_BPF] -> means that the issue is only related to the
modern_bpf
probe -
[BPF] -> means that the issue is only related to the
bpf
probe -
[KMOD] -> means that the issue is only related to the
kernel module
- :warning: -> possible problems, fix it!
- :arrow_left: -> only in the enter event
- :arrow_right: -> only in the exit event
open_by_handle_at
:arrow_right:
- [x] [MODERN BPF] we can get at maximum 8
path_components
we need to find a workaround to manage more components! - [ ]
open_flags_to_scap()
method should receive anint
value and not auint32_t
.
dup3
:arrow_right:
- [x]
dup3_flags_to_scap()
method should receive anint
value and not auint32_t
.
open
- [ ]
open_flags_to_scap()
method should receive anint
value and not auint32_t
.
openat
- [ ]
open_flags_to_scap()
method should receive anint
value and not auint32_t
.
openat2
- [ ]
open_flags_to_scap()
method should receive anint
value and not auint32_t
.
eventfd
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
2
is not necessary, there is no flag argument ineventfd
, we have it only ineventfd2
https://github.com/falcosecurity/libs/pull/516/files#r935326971. We need a new event
eventfd2
:arrow_left:
- [x] param
2
(flags
) is not implemented, we push0
to userspace
inotify_init
:arrow_left:
- [ ] [NOT ADDRESSABLE]
inotify_init
has no syscall arguments but we send one param
signalfd
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
2
(mask
) is not implemented, we push0
to userspace. We should remove it - [ ] [NOT ADDRESSABLE] param
3
(flags
)is not implemented, we push0
to userspace. We should remove it. Moreover, this syscall has not a flag argument, please see here for more details https://elixir.bootlin.com/linux/v6.5.5/source/fs/signalfd.c#L314
signalfd4
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
2
(mask
) is not implemented, we push0
to userspace. We should remove it.
timerfd_create
:arrow_left:
- [ ] param
1
(clockid
) is not implemented, we push0
to userspace. We should implement it. - [ ] param
2
(flags
) is not implemented, we push0
to userspace. We should implement it.
userfault_fd
:arrow_right:
- [ ] param
2
(flags
) miss an helper likeuserfaultfd_flags_to_scap
to convert flags to scap notation.
ptrace
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
2
(addr
) not sure we really need aPT_DYN
param, we always send the same len. - [ ] [NOT ADDRESSABLE] param
3
(data
) not sure about the utlity of sending thedata_pointer
to userspace.
mkdirat
:arrow_right:
- [ ] param
4
(mode
) we need to convert the mode to the scap format.
pipe2
- [x] we need a new event for
pipe2
otherwise we cannot catch theflags
. Right now we use the same event ofpipe
.
renameat2
:arrow_right:
- [ ] param
6
(flags
) we need to convert the flags to the scap format with an helper likerenameat2_flags_to_scap
.
execve
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
7
(cwd
) is not implemented, we push0
to userspace https://github.com/falcosecurity/libs/blob/a8561a7a117374e9c454bddc91f58f0f50b873ab/driver/bpf/fillers.h#L2417 - [x] param
17
(tty
) is auint32_t
not anint32_t
https://github.com/falcosecurity/libs/pull/1192 - [x] param
19
(loginuid
) is auint32_t
not anint32_t
, a PR is up https://github.com/falcosecurity/libs/pull/1192 - [x] [MODERN BPF] param
20
(flags
) still to implement.
execveat
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
7
(cwd
) is not implemented, we push0
to userspace https://github.com/falcosecurity/libs/blob/a8561a7a117374e9c454bddc91f58f0f50b873ab/driver/bpf/fillers.h#L2417 - [x] param
17
(tty
) is auint32_t
not anint32_t
https://github.com/falcosecurity/libs/pull/1192 - [x] param
19
(loginuid
) is auint32_t
not anint32_t
https://github.com/falcosecurity/libs/pull/1192 - [x] [MODERN BPF] param
20
(flags
) still to implement.
fork
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
7
(cwd
) is not implemented, we push0
to userspace https://github.com/falcosecurity/libs/blob/a8561a7a117374e9c454bddc91f58f0f50b873ab/driver/bpf/fillers.h#L2417
clone
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
7
(cwd
) is not implemented, we push0
to userspace https://github.com/falcosecurity/libs/blob/a8561a7a117374e9c454bddc91f58f0f50b873ab/driver/bpf/fillers.h#L2417
clone3
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
7
(cwd
) is not implemented, we push0
to userspace https://github.com/falcosecurity/libs/blob/a8561a7a117374e9c454bddc91f58f0f50b873ab/driver/bpf/fillers.h#L2417
vfork
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
7
(cwd
) is not implemented, we push0
to userspace https://github.com/falcosecurity/libs/blob/a8561a7a117374e9c454bddc91f58f0f50b873ab/driver/bpf/fillers.h#L2417
socket
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
1
(domain
) thesocket_family_to_scap
method should receive an int, not au8
, and we need to choose if the param should be on8
bits or32
bits. We need also to update thesocket_family_to_scap
with new socket families.
connect
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
2
(tuple
) in case of UNIX sockets, not sure about the utility of sending kernel pointers to userspace
socketpair
:arrow_left:
Same issues of socket syscall
- [ ] [NOT ADDRESSABLE] param
1
(domain
) thesocket_family_to_scap
method should receive an int, not au8
, and we need to choose if the param should be on8
bits or32
bits. We need also to update thesocket_family_to_scap
with new socket families.
socketpair
:arrow_right:
- [ ] [NOT ADDRESSABLE] param
4
(source
) not sure about the utility of sending kernel pointers to userspace - [ ] [NOT ADDRESSABLE] param
5
(peer
) not sure about the utility of sending kernel pointers to userspace
accept
:arrow_right:
- [x] param
5
(queuemax
) using Unix sockets, the max queue length seems not related to the value set bylisten
, more on this here: https://github.com/falcosecurity/libs/pull/544#discussion_r942246996
accept4
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
1
(flags
) still to implement, today we send always0
. This bug is used in the socketcall wokraround
listen
:arrow_left:
- [x] param
2
(backlog
) is anint
not auint32_t
, https://github.com/falcosecurity/libs/pull/1256
bpf
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
1
(cmd
) is anint
not aint64_t
flock
:arrow_left:
- [x] param
2
(operation
) we need to read it as an int and then convert it touint32_t
, while today we read it as anunsigned long
quotactl
:arrow_left:
- [ ] param
1
(cmd
) we need to read it as an int and then convert it touint32_t
, while today we read it as anunsigned long
- [ ] param
3
(id
) is anint
not aint32_t
quotactl
:arrow_right:
- [ ] param
13
(dqi_flags
) add conversion to scap format
unshare
:arrow_left:
- [ ] param
1
(flags
) we need to read it as an int and then convert it touint32_t
, while today we read it as anunsigned long
mount
:arrow_left:
- [ ] param
1
(flags
) if we want to use this info in userspace we need to convert it intoscap
format.
umount2
:arrow_left:
- [x] param
1
(flags
) if we want to use this info in userspace we need to convert it intoscap
format. This field should be anint
not aint32_t
, https://github.com/falcosecurity/libs/pull/1255 - [x] we need to define a new event pair (
PPME_SYSCALL_UMOUNT2_E
,PPME_SYSCALL_UMOUNT2_X
)
linkat
:arrow_right:
- [x] param
6
(flags
) we need to read it as an int and then convert it touint32_t
, while today we read it as anunsigned long
unlinkat
:arrow_right:
- [x] param
4
(flags
) we need to read it as an int and then convert it touint32_t
, while today we read it as anunsigned long
setns
:arrow_left:
- [x] param
2
(nstype
) we need to read it as an int and then convert it touint32_t
, while today we read it as anunsigned long
setrlimit
:arrow_left:
- [ ] param
1
(resource
) we need to read it as an int and then convert it touint8_t
, while today we read it as anunsigned long
prlimit64
:arrow_left:
- [ ] param
2
(resource
) we need to read it as an int and then convert it touint8_t
, while today we read it as anunsigned long
sendto
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
3
(tuple
) should be catched in the exit event when we know the outcome of the syscall otherwise there is the risk to catch something wrong.
sendmsg
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
3
(tuple
) should be catched in the exit event when we know the outcome of the syscall otherwise there is the risk to catch something wrong.
ppoll
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
3
(sigmask
) we send only the first 32 bits
ppoll
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
3
(sigmask
) we send only the first 32 bits
ppoll
:arrow_left:
- [ ] [NOT ADDRESSABLE] param
3
(sigmask
) we send only the first 32 bits
recvmmsg
:
[NOT ADDRESSABLE] Empty instrumentation
sendmmsg
:
[NOT ADDRESSABLE] Empty instrumentation
Hi Andrea, like always, good catch! :D
I agree with you; i'd phrase it in a different way actually: "if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)
To retain backward compatibility, i think we could:
- add a new
PT_FD32
andPT_PID32
- new drivers will send PT_PID32 and PT_FD32
- we are still compatible with old 64-bit types, while reading from scap files
- of course, bump maj schema version :/
This fixes the first 2 points; the others are different cases instead.
Hi Andrea, like always, good catch! :D
I agree with you; i'd phrase it in a different way actually: "if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)
:exploding_head: :exploding_head:
To retain backward compatibility, i think we could:
* add a new `PT_FD32` and `PT_PID32` * new drivers will send PT_PID32 and PT_FD32 * we are still compatible with old 64-bit types, while reading from scap files * of course, bump maj schema version :/
This fixes the first 2 points; the last is a different case instead.
Completely agree with this solution!
Love it, big +1, great catch!
First 2 points will be addressed by #526
I have slightly changed the issue format with Generic event issues
and Specific event issues
in this way it should be more maintainable, thank you to @hbrueckner @Molter73 @FedeDP for all the help in finding new issues
Tracking down all of this is of incredible value. Thank you a lot!
I have slightly changed the issue format with
Generic event issues
andSpecific event issues
in this way it should be more maintainable, thank you to @hbrueckner @Molter73 @FedeDP for all the help in finding new issues
You are welcome! Many thanks @Andreagit97 for the excellent summary and tracker of all the review follow-ups!
"if we today are able to push to userspace 1mln fd-only events, we could instead push up to 2mln fd-only events" This hits even harder :)
Optimism is awesome but let me cool it down a little bit :P Every event has this header, which is somewhat larger than the zero bytes it would need for that claim to be true ;)
struct ppm_evt_hdr {
#ifdef PPM_ENABLE_SENTINEL
uint32_t sentinel_begin;
#endif
uint64_t ts; /* timestamp, in nanoseconds from epoch */
uint64_t tid; /* the tid of the thread that generated this event */
uint32_t len; /* the event len, including the header */
uint16_t type; /* the event type */
uint32_t nparams; /* the number of parameters of the event */
};
BTW, we could probably easily change nparams
to 16 bits. If we do expect >64k parameters, we can hack something for the (hopefully rare) events that exceed this number.
We could also trim the tid to 32 bits, but then we use this struct all over userspace too (#sadpanda), and other environments may want large tids (e.g. gvisor), so we would have to decouple these two structs and copy data field by field between them.
Don't let me distract you from tracking down the inconsistencies though, that's an awesome job!
These two changes would cut down 6 bytes from every event, equivalent to one and a half fds with no schema changes, just a major api version bump.
You are right, of course! Nonetheless, we are wasting lots of bytes that, all together, would:
- speed up pushing events (because we need to write, and later read, a little bit less on the ring buffer)
- clear up space for new events
I was a bit over-reacting though, agree ahah
BTW, we could probably easily change nparams to 16 bits. If we do expect >64k parameters, we can hack something for the (hopefully rare) events that exceed this number. We could also trim the tid to 32 bits, but then we use this struct all over userspace too (#sadpanda), and other environments may want large tids (e.g. gvisor), so we would have to decouple these two structs and copy data field by field between them.
Yep!
Don't let me distract you from tracking down the inconsistencies though, that's an awesome job!
You gave thorough ideas! And data :D
I think @Andreagit97 is working on porting current modern-bpf
tests to work with kmod and old bpf; this should help us spot many more inconsistencies!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
/remove-lifecycle rotten
@oheifetz @therealbobo @FedeDP @Andreagit97 we all touched at least some inconsistencies and we just confirmed that for minor type confusions s32/u32 we will not create new events.
Could we get organized and address all these cases, but assign to folks beforehand so that we do not duplicate work?
Fixing all these would make the project look much better! I am happy to be reviewer, but also happy to help more if we don't find enough volunteers :)
yeah agree with that we could write below this issue if someone is going to address some points of the list :)
BTW I would avoid all changes that require a new event pair until we finally resolve the scap-file issue, unless we find something to fix ASAP
I've added the [NOT ADDRESSABLE]
marker to all issues that we cannot address now due to the scap-file management https://github.com/falcosecurity/libs/pull/1381#issuecomment-1746613905
Thank you Andrea, I think this makes it much clearer :pray:
@Andreagit97 Please mark dup3 as completed with strikethrough.
@incertum @Andreagit97 Can you guys also mark setns, flock, and unshare as complete?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale