falco
falco copied to clipboard
Falco makes journald to crash on flatcar
Describe the bug
When installing falco using helm with no custom settings at all on flatcar linux 3227.2.2, journald crashes.
How to reproduce it
We (giant swarm) can reproduce it by deploying one of our kubernetes clusters and installing the chart. We still can't reproduce the problem on a vanilla flatcar VM (no kubernetes) running falco with docker.
Expected behaviour
journald not to crash
Screenshots
Environment
- Falco version: Falco version: 0.31.0 Driver version: 319368f1ad778691164d33d59945e00c5752cd27
- System info:
{
"machine": "x86_64",
"nodename": "worker-00003M",
"release": "5.15.63-flatcar",
"sysname": "Linux",
"version": "#1 SMP Mon Aug 29 18:27:27 -00 2022"
}
- Cloud provider or hardware configuration: Azure
- OS: flatcar 3227.2.2
- Kernel:
Linux worker-00003M 5.15.63-flatcar #1 SMP Mon Aug 29 18:27:27 -00 2022 x86_64 GNU/Linux - Installation method: helm chart
Additional context
Dmesg output when the crash happens
[ 44.689183] systemd[1]: Starting Journal Service...
[ 44.715709] systemd-journald[1588]: File /var/log/journal/e64de388fdf943e58b260fc9c4fd4b6e/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 44.761049] systemd-journald[1588]: Assertion 'n + N_IOVEC_META_FIELDS + (pid_is_valid(object_pid) ? N_IOVEC_OBJECT_FIELDS : 0) + client_context_extra_fields_n_iovec(c) <= m' failed at src/journal/journald-server.c:923, function dispatch_message_real(). Aborting.
[ 44.830550] systemd-coredump[1649]: elfutils disabled, parsing ELF objects not supported
[ 44.841361] systemd-coredump[1649]: Process 1588 (systemd-journal) of user 0 dumped core.
[ 44.852691] systemd-coredump[1649]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.9281eeb07ce84c18894fe3e81ab04fce.1588.1662564835000000.zst
[ 44.873681] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
[ 44.884875] systemd[1]: systemd-journald.service: Failed with result 'core-dump'.
[ 44.896113] systemd[1]: Failed to start Journal Service.
Flatcar issue: https://github.com/flatcar-linux/Flatcar/issues/848
Hey @whites11
Thank you for reporting, however, it does not look like a Falco issue :thinking:
It's weird, though. I don't know what could cause this problem. The only thing that comes to mind is: have you tried disabling Syslog output and logging from Falco?
If you're using helm, try: --set falco.log_syslog=false --set falco.syslog_output.enabled=false
Hey @whites11
have you tried disabling Syslog output and logging from Falco?
If you're using helm, try:
--set falco.log_syslog=false --set falco.syslog_output.enabled=false
I did, but unfortunately that didn't help.
Hey @whites11 have you tried disabling Syslog output and logging from Falco? If you're using helm, try:
--set falco.log_syslog=false --set falco.syslog_output.enabled=falseI did, but unfortunately that didn't help.
Thank you for reporting. Let us know if you find a way to reproduce the problem outside your environment. I tried quickly, but I was not able to do that.
PS feel free to contact me in the Slack channel or via DM
I started from a vanilla flatcar, installed k8s on top of it using this, then run falco through the helm chart. Unfortunately I still can't replicate the issue. In any case as @leogr mentioned I doubt this is a falco issue after all, so I guess we can close this ticket.
I started from a vanilla flatcar, installed k8s on top of it using this, then run falco through the helm chart. Unfortunately I still can't replicate the issue. In any case as @leogr mentioned I doubt this is a falco issue after all, so I guess we can close this ticket.
:+1:
We can keep this open for a while since Falco users may be interested (I'm also curious about what caused the issue :smile_cat: ).
Thank you!
I'm also curious about what caused the issue smile_cat
I think I understood something more. The issue happens when falco is running and our custom auditd rules are in place:
$ cat /etc/audit/rules.d/99-default.rules
# Overridden by Giant Swarm.
-a exit,always -F arch=b64 -S execve -k auditing
-a exit,always -F arch=b32 -S execve -k auditing
I know zero about auditd so I don't know why that's the case. Also I didn't replicate this outside our setup (in a vanilla flatcar+falco environment) so there might be even more elements involved
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Hey @whites11
Any news on this?
Hey @whites11
Any news on this?
We never really got to find the real problem here, but we suspect it's a bug in journald. We applied a workaround and we didn't look back.
I guess we can close this ticket thanks
Thanks @whites11.
/close
@jasondellaluce: Closing this issue.
In response to this:
Thanks @whites11.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.