falco Document falco syscall buffer size adjustment described in blog

What would you like to be added:

I have a Falco-related question I was wondering if anyone could answer in the documentation? Reading https://sysdig.com/blog/cve-2019-8339-falco-vulnerability/ there's a small paragraph that states:

”One way to reduce the risk of dropped system calls is by increasing the size of the shared buffer between user/kernel space. For example, increasing the size of the default kernel buffer from 8mb to 128mb (per cpu) resulted in no dropped system calls, even under the extreme workload used by the proof-of-concept program”

But I couldn't find a mention in the blog article or the Falco docs as to which kernel buffer is being described here. Does anyone know the details? I was assuming we can patch the kernel module that's built via dkms to increase the buffer, but a glance over the code didn't immediately make it obvious which buffer needed to be increased.

Why is this needed:

To reduce the number of dropped syscalls on busy/large nodes

Sep 03 '19 20:09 dnwe

@fntlnz RUNBPF

Sep 03 '19 20:09 dnwe

Thanks for this issue @dnwe !!

I think that you are referring to this configuration right?

https://github.com/falcosecurity/falco/blob/6e11e75c1522e99bbbad1967f7031538e1a9c0bf/falco.yaml#L84-L89

I see that we already have some documentation for that here https://falco.org/docs/event-sources/dropped-events/

Do you see anything we could improve?

/remove-kind feature /kind documentation

RUNBPF contest

Send me an email `lo at linux.com` with your full name and address for the sticker! (I also accept encrypted emails if you have privacy concerns. You can get my public key here https://fntlnz.wtf/downloads/pubkey-0xD624DE73B2400EE4.asc)

Sep 03 '19 20:09 fntlnz

@fntlnz that documents how we log when syscalls were dropped, but as per my quote from the linked blog article the original author mentioned that a user could also increase the size of the shared buffer to prevent any syscalls from being dropped if it were sized large enough

Sep 03 '19 21:09 dnwe

I think this is the droid you are looking for: https://github.com/draios/sysdig/blob/dev/driver/ppm_ringbuffer.h#L17

Sep 03 '19 21:09 krisnova

Let me know if you want to pair if you get snagged compiling the driver - it took me a few tries to get everything dialed in correctly

Sep 03 '19 21:09 krisnova

@kris-nova perfect thanks, I'll give it a spin tomorrow and let you know

Sep 03 '19 21:09 dnwe

Yes @kris-nova it is!

@kris-nova and @dnwe can I propose to demo the compiling of the driver during our office hours in 2 weeks? It will be recorded and it can be handy to our community! Thus, in case do you agree please open an issue (with kind/debugging-hours) in the office-hours repository so we'll schedule it !

Sep 04 '19 19:09 leodido

@leodido / @kris-nova I haven't quite got around to testing it yet, but I was hoping I'd just be able to get away with patching the kernel module via the stable Dockerfile after the .deb has unpacked the source to /usr/src:

https://github.com/falcosecurity/falco/blob/6e11e75c1522e99bbbad1967f7031538e1a9c0bf/docker/stable/Dockerfile#L77-L87

With something like this:

diff --git a/Dockerfile b/Dockerfile
index d55ed24..1210e8d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -44,6 +44,9 @@ RUN curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public |
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*

+# Patch the ringbuffer in the falco kernel module to reduce dropped syscall events
+RUN sed -e '/RING_BUF_SIZE/s/8/64/' -i /usr/src/falco-*/ppm_ringbuffer.h
+
 # Change the falco config within the container to enable ISO 8601
 # output.
 RUN sed -e 's/time_format_iso_8601: false/time_format_iso_8601: true/' < /etc/falco/falco.yaml > /etc/falco/falco.yaml.new \

And let the falco-probe-loader / dkms build handle getting the compilation right for me 😎

Sep 04 '19 20:09 dnwe

Ah so that gives me:

Wed Sep 4 20:41:56 2019: Runtime error: error mapping the ring buffer for device /host/dev/falco0. Exiting.

Presumably because the /usr/bin/falco userspace process needs to be (re-)built with a matching ringbuffer size

https://github.com/draios/sysdig/blob/ce8281b2d506114ef1ea89b904cda2baa6c1fa27/userspace/libscap/scap.c#L307-L310

and

https://github.com/draios/sysdig/blob/ce8281b2d506114ef1ea89b904cda2baa6c1fa27/userspace/libscap/scap.c#L351-L370

Sep 04 '19 20:09 dnwe

Yep I think thats what we need.

Just curious what is the use case here for expanding the ring buffer size? Is this in response to the kernel level components dropping syscall events?

Sep 09 '19 20:09 krisnova

Yes as per the first post, the linked article called it out as an option to reduce the amount of dropped syscalls. We already enabled logging and just wanted to test it out to see if we could reduce the occurrences.

I believe I got a build working, but hadn't had a chance to deploy it out yet. Will let you know

Sep 09 '19 22:09 dnwe

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Nov 08 '19 22:11 stale[bot]

FYI we have been running for a while now with sed -e '/RING_BUF_SIZE/s/8/96/' -i /usr/src/sysdig/driver/ppm_ringbuffer.h which has reduced the number of dropped events, although we still see 1 or 2, but quite infrequently

{
  "output": "Falco internal: syscall event drop. 1 system calls dropped in last second.",
  "output_fields": {
    "ebpf_enabled": "0",
    "n_drops": "1",
    "n_drops_buffer": "1",
    "n_drops_bug": "0",
    "n_drops_pf": "0",
    "n_evts": "41188"
  },
  "priority": "Critical",
  "rule": "Falco internal: syscall event drop",
  "time": "2019-11-13T17:14:13.034211346Z"
}
{
  "output": "Falco internal: syscall event drop. 1 system calls dropped in last second.",
  "output_fields": {
    "ebpf_enabled": "0",
    "n_drops": "1",
    "n_drops_buffer": "1",
    "n_drops_bug": "0",
    "n_drops_pf": "0",
    "n_evts": "31529"
  },
  "priority": "Critical",
  "rule": "Falco internal: syscall event drop",
  "time": "2019-11-13T17:51:17.684344697Z"
}
{
  "output": "Falco internal: syscall event drop. 1 system calls dropped in last second.",
  "output_fields": {
    "ebpf_enabled": "0",
    "n_drops": "1",
    "n_drops_buffer": "1",
    "n_drops_bug": "0",
    "n_drops_pf": "0",
    "n_evts": "23159"
  },
  "priority": "Critical",
  "rule": "Falco internal: syscall event drop",
  "time": "2019-11-13T20:59:24.474584970Z"
}

Nov 13 '19 22:11 dnwe

From Repository Planning - Low hanging fruit: start by running in different environments with different kinds of Cpus and workloads and document suggested sizes for the ring buffer.

Dec 04 '19 16:12 fntlnz

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Feb 02 '20 16:02 stale[bot]

keeping fresh

Feb 02 '20 16:02 dnwe

What does here think about making the ring buffer size configurable via a flag ?

Also for everyone reading here, the ring buffer is not used for the eBPF implementation. This might also be useful information when comparing performances and to go to the root of the cause of drops.

Feb 02 '20 23:02 fntlnz

We've never tried out the eBPF impl as the docs seemed like they suggested the kernel module was the recommended choice. Do we still get notified of drops with eBPF in the same way? I could give it a spin

Feb 03 '20 00:02 dnwe

@dnwe yes! drops go through the same process.

Just yesterday I made a PR documenting the eBPF installation process for multiple kinds of installations https://github.com/falcosecurity/falco-website/pull/134

Please be aware you might still have SYSDIG_BPF_PROBE if you are on < 0.18.0 - everything should be consistently set to FALCO_BPF_PROBE for 0.20.0

Feb 19 '20 10:02 fntlnz

Is there a way to debug the dropped events? On a small VPS, we see e.g. this:

Apr  5 15:20:02 host falco: Falco internal: syscall event drop. 1 system calls dropped in last second.
Apr  5 15:20:02 host falco: 15:20:02.335890806: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=1 n_drops_bug=0 n_drops_pf=0 n_evts=12297)
Apr  5 15:25:02 host falco: Falco internal: syscall event drop. 2 system calls dropped in last second.
Apr  5 15:25:02 host falco: 15:25:02.429912553: Critical Falco internal: syscall event drop. 2 system calls dropped in last second. (ebpf_enabled=0 n_drops=2 n_drops_buffer=2 n_drops_bug=0 n_drops_pf=0 n_evts=12532)
Apr  5 15:45:02 host falco: Falco internal: syscall event drop. 1 system calls dropped in last second.
Apr  5 15:45:02 host falco: 15:45:02.858715115: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=1 n_drops_bug=0 n_drops_pf=0 n_evts=15544)

The n_evts counter looks like a small event count between the log messages. Any idea what happens here?

Apr 05 '20 14:04 derSascha

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jun 05 '20 00:06 stale[bot]

What does here think about making the ring buffer size configurable via a flag ?

@fntlnz did anything happen with this in the end? Currently we're still just maintaining our patch and building falco from source, but it'd be cool if we could just use the official docker image and provide the ringbuffer size as a parameter for the userspace app and as a module parameter for the kernel module

Jun 05 '20 13:06 dnwe

@dnwe Is it an option to increase the frequency of reading the buffer as to avoid having it full in the first place ? Just wondering because we are having this issue as well and we would really like to avoid having to recompile everything and spin our own version of the container!

Jul 08 '20 20:07 nvanheuverzwijn

Is it expected behavior to have dropped syscal in the log ? We consistently have 1 to 4 syscall dropped every now and then. Should we ignore there message if it's a low amount of dropped syscall ? I wonder what kind of log someone exploiting cve-2019-8339 would generate. If it would generate hundreds of syscall drop, maybe it's an OK solution to just ignore low syscall drop log message.

Jul 08 '20 20:07 nvanheuverzwijn

Are there any plans to port this workaround into the helm chart so that we can pass it as a parameter during install/upgrade?

Aug 03 '20 22:08 emcay

@leodido @fntlnz @kris-nova Hey guys, if you are interested in the patch, I could bring it back to falco into the cmake patching folder (falco/cmake/modules/sysdig-repo/patch/).

Let me know what's up.

Aug 07 '20 16:08 nvanheuverzwijn

@nvanheuverzwijn +1

Sep 14 '20 22:09 jannis-a

@emcay With our patch, you can pass this environment variable : FALCO_DRIVER_LOADER_ARGS: "--compile --module-arg ring_buf_size=134217728"

You can use our docker image ghcr.io/kronostechnologies/falco:0.24.1-18 (18 commit since patch 0.24.1) however, it is not up to date with the latest as of yet but works for sure.

Sep 14 '20 22:09 nvanheuverzwijn

@nvanheuverzwijn This should be sufficient for testing, thank you! Anyway I could take a look at the Dockerfile? Due the nature of our project I cannot blindly run software on our infra. :)

Sep 14 '20 23:09 jannis-a

@jannis-a The fork is here https://github.com/kronostechnologies/falco

Sep 14 '20 23:09 nvanheuverzwijn

falco falco copied to clipboard

Document falco syscall buffer size adjustment described in blog

falco
falco copied to clipboard