beats
beats copied to clipboard
Replace `github.com/coreos/go-systemd/v22/sdjournal` by `journalctl`
Proposed commit message
github.com/coreos/go-systemd/v22/sdjournal
is removed and Filebeat now calls journalctl
directly to read journald entries.
sdjournal
relies on libsystemd to read journal files and the active system journal, however due to a bug (https://github.com/systemd/systemd/pull/29456) in systemd, it crashes during journal rotation. Filebeat is affected by it, if the host has a libsystemd affected, during a journal rotation (usually only on high load) Filebeat will crash with a SIGBUS. There is no way to prevent or recover from this crash, it happens outside of our codebase, the SIGBUS is turned into a panic by the Go runtime and we cannot recover from it.
The bug has been fixed in Systemd v255, which is not widely used yet. So most systems out there Filebeat might crash when reading journal logs.
Because there is no way for Filebeat to avoid the crash, we decided to replace github.com/coreos/go-systemd/v22/sdjournal
by calling journalctl
directly and reading it stdout.
On hosts where Filebeat crashes when reading from journald, journalctl
can successfully read all journal files. OpenTelemetry collector also calls journalctl
and has no issues reading the journal during rotation.
Because the reading backend has changed, some configuration options have been removed and behaviours adapted to match journalctl
.
Breaking changes: Changes that will prevent the journald input from starting:
-
include_matches.match
does not accept theand
andor
keys any more.
Changes in the journald input behaviour:
-
backoff
,max_backoff
,cursor_seek_fallback
have been removed -
seek
now has only 3 modes:since
,head
andtail
. - If there is a cursor in the registry, it will always be used and the
seek
option will be ignored.
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] I have made corresponding change to the default configuration files
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] I have added an entry in
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.
Disruptive User Impact
Even though the journald input is not GA yet, which makes breaking changes acceptable, this PR introduces breaking changes that will make certain configurations not work as expected or not to work at all.
Changes that will prevent the journald input from starting:
-
include_matches.match
does not accept theand
andor
keys any more.
Changes in the journald input behaviour:
-
backoff
,max_backoff
,cursor_seek_fallback
have been removed -
seek
now has only 3 modes:since
,head
andtail
. - If there is a cursor in the registry, it will always be used and the
seek
option will be ignored.
Author's Checklist
- [ ] Stress test the new input
- [ ] Manual test to ensure all related issues are actually closed by this PR
How to test this PR locally
Using the following input configuration:
filebeat.inputs:
- type: journald
id: PR-testing
Start Filebeat and assert the journald messages are sent to the configured output.
To manually see the journald messages and compare with what you see in Filebeat's output, you can use:
journalctl --follow -o json | jq -c --sort-keys
This will print out all fields Filebeat can read.
Related issues
- Closes #34077
- Closes #32782
- Closes #30398
- Closes #39352
- Closes https://github.com/elastic/elastic-agent/issues/4250
- Closes https://github.com/elastic/beats/issues/39820
~~## Use cases~~ ~~## Screenshots~~ ~~## Logs~~