falco Falco makes journald to crash on flatcar

Describe the bug

When installing falco using helm with no custom settings at all on flatcar linux 3227.2.2, journald crashes.

How to reproduce it

We (giant swarm) can reproduce it by deploying one of our kubernetes clusters and installing the chart. We still can't reproduce the problem on a vanilla flatcar VM (no kubernetes) running falco with docker.

Expected behaviour

journald not to crash

Screenshots

Environment

Falco version: Falco version: 0.31.0 Driver version: 319368f1ad778691164d33d59945e00c5752cd27
System info:

{
  "machine": "x86_64",
  "nodename": "worker-00003M",
  "release": "5.15.63-flatcar",
  "sysname": "Linux",
  "version": "#1 SMP Mon Aug 29 18:27:27 -00 2022"
}

Cloud provider or hardware configuration: Azure
OS: flatcar 3227.2.2
Kernel: Linux worker-00003M 5.15.63-flatcar #1 SMP Mon Aug 29 18:27:27 -00 2022 x86_64 GNU/Linux
Installation method: helm chart

Additional context

Dmesg output when the crash happens

[   44.689183] systemd[1]: Starting Journal Service...
[   44.715709] systemd-journald[1588]: File /var/log/journal/e64de388fdf943e58b260fc9c4fd4b6e/system.journal corrupted or uncleanly shut down, renaming and replacing.
[   44.761049] systemd-journald[1588]: Assertion 'n + N_IOVEC_META_FIELDS + (pid_is_valid(object_pid) ? N_IOVEC_OBJECT_FIELDS : 0) + client_context_extra_fields_n_iovec(c) <= m' failed at src/journal/journald-server.c:923, function dispatch_message_real(). Aborting.
[   44.830550] systemd-coredump[1649]: elfutils disabled, parsing ELF objects not supported
[   44.841361] systemd-coredump[1649]: Process 1588 (systemd-journal) of user 0 dumped core.
[   44.852691] systemd-coredump[1649]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.9281eeb07ce84c18894fe3e81ab04fce.1588.1662564835000000.zst
[   44.873681] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
[   44.884875] systemd[1]: systemd-journald.service: Failed with result 'core-dump'.
[   44.896113] systemd[1]: Failed to start Journal Service.

Flatcar issue: https://github.com/flatcar-linux/Flatcar/issues/848

Sep 08 '22 15:09 whites11

Hey @whites11

Thank you for reporting, however, it does not look like a Falco issue :thinking:

It's weird, though. I don't know what could cause this problem. The only thing that comes to mind is: have you tried disabling Syslog output and logging from Falco?

If you're using helm, try: --set falco.log_syslog=false --set falco.syslog_output.enabled=false

Sep 08 '22 21:09 leogr

Hey @whites11

have you tried disabling Syslog output and logging from Falco?

If you're using helm, try: --set falco.log_syslog=false --set falco.syslog_output.enabled=false

I did, but unfortunately that didn't help.

Sep 09 '22 05:09 whites11

Hey @whites11 have you tried disabling Syslog output and logging from Falco? If you're using helm, try: --set falco.log_syslog=false --set falco.syslog_output.enabled=false

I did, but unfortunately that didn't help.

Thank you for reporting. Let us know if you find a way to reproduce the problem outside your environment. I tried quickly, but I was not able to do that.

PS feel free to contact me in the Slack channel or via DM

Sep 09 '22 10:09 leogr

I started from a vanilla flatcar, installed k8s on top of it using this, then run falco through the helm chart. Unfortunately I still can't replicate the issue. In any case as @leogr mentioned I doubt this is a falco issue after all, so I guess we can close this ticket.

Sep 12 '22 07:09 whites11

I started from a vanilla flatcar, installed k8s on top of it using this, then run falco through the helm chart. Unfortunately I still can't replicate the issue. In any case as @leogr mentioned I doubt this is a falco issue after all, so I guess we can close this ticket.

:+1:

We can keep this open for a while since Falco users may be interested (I'm also curious about what caused the issue :smile_cat: ).

Thank you!

Sep 12 '22 09:09 leogr

I'm also curious about what caused the issue smile_cat

I think I understood something more. The issue happens when falco is running and our custom auditd rules are in place:

$ cat /etc/audit/rules.d/99-default.rules 
# Overridden by Giant Swarm.
-a exit,always -F arch=b64 -S execve -k auditing
-a exit,always -F arch=b32 -S execve -k auditing

I know zero about auditd so I don't know why that's the case. Also I didn't replicate this outside our setup (in a vanilla flatcar+falco environment) so there might be even more elements involved

Sep 12 '22 09:09 whites11

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Dec 11 '22 09:12 poiana

Hey @whites11

Any news on this?

Dec 16 '22 10:12 leogr

Hey @whites11

Any news on this?

We never really got to find the real problem here, but we suspect it's a bug in journald. We applied a workaround and we didn't look back.

I guess we can close this ticket thanks

Dec 16 '22 10:12 whites11

Thanks @whites11.

/close

Dec 16 '22 10:12 jasondellaluce

@jasondellaluce: Closing this issue.

In response to this:

Thanks @whites11.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 16 '22 10:12 poiana

falco falco copied to clipboard

Falco makes journald to crash on flatcar

falco
falco copied to clipboard