libs
libs copied to clipboard
new: introduce `is_exe_upper_layer`
What type of PR is this?
Uncomment one (or more)
/kind <>lines:
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
Any specific area of the project related to this PR?
Uncomment one (or more)
/area <>lines:
/area build
/area driver-kmod
/area driver-ebpf
/area libscap
/area libsinsp
/area tests
/area proposals
What this PR does / why we need it:
This PR tries to deal with overlayfs, a union mount filesystem often used by container runtimes, to understand if a process is executing something that is part of the base image of a container or not. Monitoring this could be relevant from a security standpoint since best practices state that containers should be immutable. For this reason, in many use cases, we don't expect processes in the container to execute something that is not part of the base image, whether that file was created in the container or an existing one was modified.
To do so, the PR aims at identifying the layer of the file being executed by a process: if the file belongs to the lower layer, it means that it was part of the base image of the container. Otherwise, the process executed something that is part of the upper layer, and a flag is set upon the execve event.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
To better understand the code, I'd like to give reviewers some useful pointers. First, we must check if the file being executed is part of an overlayfs. We can do that by retrieving the superblock magic of the file and comparing it with the OVERLAYFS_SUPER_MAGIC. You can see here that this constant is used when an overlayfs is mounted.
Then, to check the layer of the executable file, we can do some pointer arithmetics, having in mind that the pointer to the inode that we can retrieve form struct file is pointing to the vfs_inode member of the ovl_inode struct, as you can see here.
The last thing that I wanted to discuss is that for now I didn't find a way to retrieve the upper_layer information when scanning the proc filesystem. One possible way to do it could be by inspecting the /proc/<pid>/mounts file and checking if there is an overlayfs entry, which in case we can use to retrieve the upperdir directory and look for the exe file there. However, this would easily work when Falco is installed on the host, but could be a problem when it is running as a container. This is because most of the time the upperdir directory is placed in /var/lib/docker, /var/lib/containerd directories. This means we would need to give the Falco container access to those directories. I would like to know what do you think, and if you like it, I can implement it. Also, I am open to any other possible solution on this!
EDIT: I have implemented this approach
Does this PR introduce a user-facing change?:
new: introduce is_exe_upper_layer
Hi @loresuso. Thanks for your PR.
I'm waiting for a falcosecurity member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
This is really cool @loresuso ! Thanks for this effort!
/ok-to-test
Hey @FedeDP, thank you!
I agree with you, maybe we can better document the fact those new fields (is_exe_writable and is_exe_upper_layer) can be trusted only on specific kernel versions? I thought about that too, didn't know how to solve it honestly
I agree with you, maybe we can better document the fact those new fields (is_exe_writable and is_exe_upper_layer) can be trusted only on specific kernel versions? I thought about that too, didn't know how to solve it honestly
Yeah that seems a reasonable thing to do.
Another way of solving would be to use a string instead of a boolean and returning <NA> where the info is not available, but i don't quite like the idea :)
I agree with you, maybe we can better document the fact those new fields (is_exe_writable and is_exe_upper_layer) can be trusted only on specific kernel versions? I thought about that too, didn't know how to solve it honestly
Yeah that seems a reasonable thing to do. Another way of solving would be to use a string instead of a boolean and returning where the info is not available, but i don't quite like the idea :)
Yeah, I don't like that too, execve gets called pretty much of course, let's not waste space on the ring buffer just for this :)
Yeah, I don't like that too, execve gets called pretty much of course, let's not waste space on the ring buffer just for this :)
I meant that we can push onto the ring buffer eg: [ -1, 0, 1] where -1 means "NA", 0 means "false" and 1 means "true". Ring buffer would be untouched. What i don't really like is having the filterchecks on a string instead of a boolean, it is much less elegant.
Ah I got it now, sorry! Yeah you are right
Hey @FedeDP, I think I have addressed all your comments:
- Stated that
is_exe_upper_layeris meaningful if underlying kernel version is above 3.18 - Added a small function
get_exe_inodeto retrieve the inode of the executable. It is now used byexe_writableandexe_upper_layer. - decided to add a function to properly set the
upper_layerflag when scanningprocfs. I tested it with docker and containerd and it seems perfectly ok, let me know what do you think on this - Noted that the overlays magic constant was defined in a c file before kernel 4.5, and moved to magic.h only after, so to make it work, I defined PPM_OVERLAYFS_SUPER_MAGIC as you can see
Thank you! Your feedback is always much appreciated!!
/retest
Hey maintainers! I have rebased this PR again, and I still think is a good feature to add. What do you think? 🙂
/cc @FedeDP @Andreagit97 @LucaGuerra
@loresuso: GitHub didn't allow me to request PR reviews from the following users: LucaGuerra.
Note that only falcosecurity members and repo collaborators can review this PR, and authors cannot review their own PRs.
In response to this:
/cc @FedeDP @Andreagit97 @LucaGuerra
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: loresuso
Once this PR has been reviewed and has the lgtm label, please ask for approval from fededp by writing /assign @fededp in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/milestone 0.10.0
One more thought/question: You mentioned that this does not work for all runtimes, would it make sense to mention the runtimes that are compatible in the definitions?
@loresuso as modern_bpf is on the near-term roadmap for initial experimental release, would it be possible to already add this new signal to modern_bpf as well? Would be much appreciated. CC @Andreagit97
CC @LucaGuerra would also appreciate adding the exe writable signal to modern_bpf (if possible) to have feature parity across all 3 drivers.
Hello everybody, I have reworked this PR addressing the comments and implemented the related version in the modern eBPF probe, so that we have feature parity in all three instrumentation modes.
I also tested the old eBPF probe and the kmod on older kernels, like version 4.18 and unfortunately we are not able to retrieve the upper layer information. It seems that the kernel uses a sb_magic that is different in newer kernel and associates it to ext3 in my experiments.
If everyone like the approach of this PR, I can work on finding the minimum version from which it starts to be working and add a comment about it inside the code, but first I would like to hear everyone's opinion since this is a really time consuming task.
Also, I am thinking about ways to introduce tests for this feature. A possible way would be to mount an overlayfs and try to execute something from there and verify that the flag gets set.
This PR LGTM! Thank you for all the effort you put into it! :tada:
If everyone like the approach of this PR, I can work on finding the minimum version from which it starts to be working and add a comment about it inside the code, but first I would like to hear everyone's opinion since this is a really time consuming task.
Yeah, it would be amazing to find the first working version!
Also, I am thinking about ways to introduce tests for this feature. A possible way would be to mount an overlayfs and try to execute something from there and verify that the flag gets set.
Absolutely, I think we need a test, in this way we can easily detect when this access vfs_inode + sizeof(struct inode)); is no more valid
LGTM label has been added.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: Andreagit97, FedeDP, loresuso
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [Andreagit97,FedeDP]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment