falco
falco copied to clipboard
Document falco syscall buffer size adjustment described in blog
What would you like to be added:
I have a Falco-related question I was wondering if anyone could answer in the documentation? Reading https://sysdig.com/blog/cve-2019-8339-falco-vulnerability/ there's a small paragraph that states:
”One way to reduce the risk of dropped system calls is by increasing the size of the shared buffer between user/kernel space. For example, increasing the size of the default kernel buffer from 8mb to 128mb (per cpu) resulted in no dropped system calls, even under the extreme workload used by the proof-of-concept program”
But I couldn't find a mention in the blog article or the Falco docs as to which kernel buffer is being described here. Does anyone know the details? I was assuming we can patch the kernel module that's built via dkms to increase the buffer, but a glance over the code didn't immediately make it obvious which buffer needed to be increased.
Why is this needed:
To reduce the number of dropped syscalls on busy/large nodes
@fntlnz RUNBPF
Thanks for this issue @dnwe !!
I think that you are referring to this configuration right?
https://github.com/falcosecurity/falco/blob/6e11e75c1522e99bbbad1967f7031538e1a9c0bf/falco.yaml#L84-L89
I see that we already have some documentation for that here https://falco.org/docs/event-sources/dropped-events/
Do you see anything we could improve?
/remove-kind feature /kind documentation
RUNBPF contest
Send me an email `lo at linux.com` with your full name and address for the sticker! (I also accept encrypted emails if you have privacy concerns. You can get my public key here https://fntlnz.wtf/downloads/pubkey-0xD624DE73B2400EE4.asc)
@fntlnz that documents how we log when syscalls were dropped, but as per my quote from the linked blog article the original author mentioned that a user could also increase the size of the shared buffer to prevent any syscalls from being dropped if it were sized large enough
I think this is the droid you are looking for: https://github.com/draios/sysdig/blob/dev/driver/ppm_ringbuffer.h#L17
Let me know if you want to pair if you get snagged compiling the driver - it took me a few tries to get everything dialed in correctly
@kris-nova perfect thanks, I'll give it a spin tomorrow and let you know
Yes @kris-nova it is!
@kris-nova and @dnwe can I propose to demo the compiling of the driver during our office hours in 2 weeks? It will be recorded and it can be handy to our community! Thus, in case do you agree please open an issue (with kind/debugging-hours) in the office-hours repository so we'll schedule it !
@leodido / @kris-nova I haven't quite got around to testing it yet, but I was hoping I'd just be able to get away with patching the kernel module via the stable Dockerfile after the .deb has unpacked the source to /usr/src:
https://github.com/falcosecurity/falco/blob/6e11e75c1522e99bbbad1967f7031538e1a9c0bf/docker/stable/Dockerfile#L77-L87
With something like this:
diff --git a/Dockerfile b/Dockerfile
index d55ed24..1210e8d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -44,6 +44,9 @@ RUN curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public |
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
+# Patch the ringbuffer in the falco kernel module to reduce dropped syscall events
+RUN sed -e '/RING_BUF_SIZE/s/8/64/' -i /usr/src/falco-*/ppm_ringbuffer.h
+
# Change the falco config within the container to enable ISO 8601
# output.
RUN sed -e 's/time_format_iso_8601: false/time_format_iso_8601: true/' < /etc/falco/falco.yaml > /etc/falco/falco.yaml.new \
And let the falco-probe-loader / dkms build handle getting the compilation right for me 😎
Ah so that gives me:
Wed Sep 4 20:41:56 2019: Runtime error: error mapping the ring buffer for device /host/dev/falco0. Exiting.
Presumably because the /usr/bin/falco userspace process needs to be (re-)built with a matching ringbuffer size
https://github.com/draios/sysdig/blob/ce8281b2d506114ef1ea89b904cda2baa6c1fa27/userspace/libscap/scap.c#L307-L310
and
https://github.com/draios/sysdig/blob/ce8281b2d506114ef1ea89b904cda2baa6c1fa27/userspace/libscap/scap.c#L351-L370
Yep I think thats what we need.
Just curious what is the use case here for expanding the ring buffer size? Is this in response to the kernel level components dropping syscall events?
Yes as per the first post, the linked article called it out as an option to reduce the amount of dropped syscalls. We already enabled logging and just wanted to test it out to see if we could reduce the occurrences.
I believe I got a build working, but hadn't had a chance to deploy it out yet. Will let you know
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
FYI we have been running for a while now with sed -e '/RING_BUF_SIZE/s/8/96/' -i /usr/src/sysdig/driver/ppm_ringbuffer.h
which has reduced the number of dropped events, although we still see 1 or 2, but quite infrequently
{
"output": "Falco internal: syscall event drop. 1 system calls dropped in last second.",
"output_fields": {
"ebpf_enabled": "0",
"n_drops": "1",
"n_drops_buffer": "1",
"n_drops_bug": "0",
"n_drops_pf": "0",
"n_evts": "41188"
},
"priority": "Critical",
"rule": "Falco internal: syscall event drop",
"time": "2019-11-13T17:14:13.034211346Z"
}
{
"output": "Falco internal: syscall event drop. 1 system calls dropped in last second.",
"output_fields": {
"ebpf_enabled": "0",
"n_drops": "1",
"n_drops_buffer": "1",
"n_drops_bug": "0",
"n_drops_pf": "0",
"n_evts": "31529"
},
"priority": "Critical",
"rule": "Falco internal: syscall event drop",
"time": "2019-11-13T17:51:17.684344697Z"
}
{
"output": "Falco internal: syscall event drop. 1 system calls dropped in last second.",
"output_fields": {
"ebpf_enabled": "0",
"n_drops": "1",
"n_drops_buffer": "1",
"n_drops_bug": "0",
"n_drops_pf": "0",
"n_evts": "23159"
},
"priority": "Critical",
"rule": "Falco internal: syscall event drop",
"time": "2019-11-13T20:59:24.474584970Z"
}
From Repository Planning - Low hanging fruit: start by running in different environments with different kinds of Cpus and workloads and document suggested sizes for the ring buffer.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
keeping fresh
What does here think about making the ring buffer size configurable via a flag ?
Also for everyone reading here, the ring buffer is not used for the eBPF implementation. This might also be useful information when comparing performances and to go to the root of the cause of drops.
We've never tried out the eBPF impl as the docs seemed like they suggested the kernel module was the recommended choice. Do we still get notified of drops with eBPF in the same way? I could give it a spin
@dnwe yes! drops go through the same process.
Just yesterday I made a PR documenting the eBPF installation process for multiple kinds of installations https://github.com/falcosecurity/falco-website/pull/134
Please be aware you might still have SYSDIG_BPF_PROBE
if you are on < 0.18.0 - everything should be consistently set to FALCO_BPF_PROBE
for 0.20.0
Is there a way to debug the dropped events? On a small VPS, we see e.g. this:
Apr 5 15:20:02 host falco: Falco internal: syscall event drop. 1 system calls dropped in last second.
Apr 5 15:20:02 host falco: 15:20:02.335890806: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=1 n_drops_bug=0 n_drops_pf=0 n_evts=12297)
Apr 5 15:25:02 host falco: Falco internal: syscall event drop. 2 system calls dropped in last second.
Apr 5 15:25:02 host falco: 15:25:02.429912553: Critical Falco internal: syscall event drop. 2 system calls dropped in last second. (ebpf_enabled=0 n_drops=2 n_drops_buffer=2 n_drops_bug=0 n_drops_pf=0 n_evts=12532)
Apr 5 15:45:02 host falco: Falco internal: syscall event drop. 1 system calls dropped in last second.
Apr 5 15:45:02 host falco: 15:45:02.858715115: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=1 n_drops_bug=0 n_drops_pf=0 n_evts=15544)
The n_evts counter looks like a small event count between the log messages. Any idea what happens here?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
What does here think about making the ring buffer size configurable via a flag ?
@fntlnz did anything happen with this in the end? Currently we're still just maintaining our patch and building falco from source, but it'd be cool if we could just use the official docker image and provide the ringbuffer size as a parameter for the userspace app and as a module parameter for the kernel module
@dnwe Is it an option to increase the frequency of reading the buffer as to avoid having it full in the first place ? Just wondering because we are having this issue as well and we would really like to avoid having to recompile everything and spin our own version of the container!
Is it expected behavior to have dropped syscal in the log ? We consistently have 1 to 4 syscall dropped every now and then. Should we ignore there message if it's a low amount of dropped syscall ? I wonder what kind of log someone exploiting cve-2019-8339 would generate. If it would generate hundreds of syscall drop, maybe it's an OK solution to just ignore low syscall drop log message.
Are there any plans to port this workaround into the helm chart so that we can pass it as a parameter during install/upgrade?
@leodido @fntlnz @kris-nova Hey guys, if you are interested in the patch, I could bring it back to falco into the cmake patching folder (falco/cmake/modules/sysdig-repo/patch/).
Let me know what's up.
@nvanheuverzwijn +1
@emcay With our patch, you can pass this environment variable : FALCO_DRIVER_LOADER_ARGS: "--compile --module-arg ring_buf_size=134217728"
You can use our docker image ghcr.io/kronostechnologies/falco:0.24.1-18
(18 commit since patch 0.24.1) however, it is not up to date with the latest as of yet but works for sure.
@nvanheuverzwijn This should be sufficient for testing, thank you! Anyway I could take a look at the Dockerfile
? Due the nature of our project I cannot blindly run software on our infra. :)
@jannis-a The fork is here https://github.com/kronostechnologies/falco