falco icon indicating copy to clipboard operation
falco copied to clipboard

Falco Webhook getting an error - "http: request body too large"

Open antikilahdjs opened this issue 2 years ago • 9 comments

Describe the bug

How to reproduce it

  • Install using the normal way to use k8s-audit. I used the official helm charts

Expected behaviour

In my lab everything works perfecly because I dont have a large environment but in my production I am facing the error about the body is too large then I had increased the 2 parameters to works correctly

maxEventSize: 134217728
webhookMaxBatchSize: 268435456

Then the POD memory increased to 38gb or more and the cores either, so I would like to know it is a bug or not.

My environment is too large but is so weird because I tested other applications and works around 12gb.

I would like to fix the error or if I did something wrong please help me on it.

Screenshots

image

image

Environment

  • Falco version:

Thu Sep 14 15:17:25 2023: Falco version: 0.35.1 (x86_64) Thu Sep 14 15:17:25 2023: Falco initialized with configuration file: /etc/falco/falco.yaml {"default_driver_version":"5.0.1+driver","driver_api_version":"4.0.0","driver_schema_version":"2.0.0","engine_version":"17","falco_version":"0.35.1","libs_version":"0.11.3","plugin_api_version":"3.0.0"}

  • System info:

{ "machine": "x86_64", "nodename": "falco-auditing-56bdb4c9b6-5wbjr", "release": "4.18.0-348.el8.0.2.x86_64", "sysname": "Linux", "version": "#1 SMP Sun Nov 14 00:51:12 UTC 2021" }

  • Cloud provider or hardware configuration:
  • OS: Redhat 8.5
  • Kernel:

4.18.0-348.el8.0.2.x86_64

  • Installation method:

Officinal Helm Charts on https://github.com/falcosecurity/charts

Additional context

  2023/09/14 15:10:45 [k8saudit] bad request: http: request body too large

 2023/09/14 15:10:57 [k8saudit] bad request: http: request body too large

 2023/09/14 15:10:59 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:00 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:04 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:05 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:14 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:16 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:21 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:23 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:26 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:35 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:35 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:40 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:43 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:44 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:51 [k8saudit] bad request: http: request body too large

 2023/09/14 15:11:56 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:00 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:01 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:03 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:12 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:14 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:16 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:17 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:22 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:31 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:32 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:36 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:39 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:43 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:49 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:54 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:55 [k8saudit] bad request: http: request body too large

 2023/09/14 15:12:58 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:07 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:12 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:14 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:14 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:17 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:29 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:34 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:35 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:38 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:40 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:51 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:52 [k8saudit] bad request: http: request body too large

 2023/09/14 15:13:58 [k8saudit] bad request: http: request body too large

 2023/09/14 15:14:01 [k8saudit] bad request: http: request body too large

 2023/09/14 15:14:04 [k8saudit] bad request: http: request body too large

 2023/09/14 15:14:13 [k8saudit] bad request: http: request body too large

 2023/09/14 15:14:14 [k8saudit] bad request: http: request body too large

antikilahdjs avatar Sep 14 '23 15:09 antikilahdjs

ei thank you for reporting!

Then the POD memory increased to 38gb or more and the cores either, so I would like to know it is a bug or not.

Uhm it seems like a bug, we need to investigate more on this!

Andreagit97 avatar Sep 15 '23 13:09 Andreagit97

Thank you so much @Andreagit97. I will send below a screenshoot from real query in Prometheus. I included a resources limits to 42gb but if remove those limits it will be reach out more than 120gb

image

Start the auditing and in 3 minutes the memory reach out 22gb

image

antikilahdjs avatar Sep 15 '23 14:09 antikilahdjs

Thank you for the additional data, right now we are a little bit busy but we will come to it after the falco release!

Andreagit97 avatar Sep 15 '23 15:09 Andreagit97

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar Dec 24 '23 15:12 poiana

Not fixed

antikilahdjs avatar Dec 29 '23 01:12 antikilahdjs

/remove-lifecycle stale

Andreagit97 avatar Jan 03 '24 13:01 Andreagit97

You increased max eventsize to 134Gb and max webhook batch size to 268Gb? In which case the memory usage is sort of expected I guess, as up to 268GB of json has to be processed at once...

A few things you might experiment with:

  • limit the number of events in a single batch by setting the --audit-webhook-batch-max-size flag on your api server, you might need multiple falco instances to keep up with your audit event stream, as you mention having a large cluster
  • use the falco tailored audit-policy.yaml (docs) in case you are not already doing so, as the api server can generate massive amounts of audit events which are not all relevant to falco
  • as some events include the requestObject (e.g. a ConfigMap), you might be able to find the event which includes some massive k8s object and consider dropping it using the audit-policy.yaml

sboschman avatar Feb 14 '24 15:02 sboschman

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana avatar May 14 '24 15:05 poiana

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana avatar Jun 29 '24 15:06 poiana

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana avatar Jul 29 '24 16:07 poiana

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

poiana avatar Jul 29 '24 16:07 poiana