fluent-package-builder icon indicating copy to clipboard operation
fluent-package-builder copied to clipboard

fluent-plugin-systemd fails with SIGABORT on Ubuntu 21.04

Open scrwr opened this issue 3 years ago • 12 comments

When using fluent-plugin-systemd the worker crashes hard with a SIGABRT. Initially we assumed it to be a problem with the plugin, but it turned out to be related to libjemalloc. After removing

Environment=LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so

from the service, crashes are gone.

See https://github.com/ledbettj/systemd-journal/issues/93 for more details.

We were using td-agent 3 in the above example, but the issue is the same with td-agent 4.

Some more info:

  • Ubuntu 21.10
  • td-agent 4.3.0-1 from http://packages.treasuredata.com/4/ubuntu/focal/

Related config part:

<source>
  @type systemd
  tag systemd
  path /var/log/journal
  <storage>
    @type local
    persistent true
    path /var/tmp/fluentd_systemd
  </storage>
  <entry>
    fields_strip_underscores true
    fields_lowercase true
  </entry>
</source>

Let me know, in case I can help with further details.

scrwr avatar Feb 09 '22 18:02 scrwr

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

github-actions[bot] avatar May 11 '22 10:05 github-actions[bot]

This issue was automatically closed because of stale in 30 days

github-actions[bot] avatar Jun 10 '22 10:06 github-actions[bot]

Seems like this problem still exists.

Environment

  • Ubuntu 22.04
  • td-agent 4.4.1 fluentd 1.15.2 (c32842297ed2c306f1b841a8f6e55bdd0f1cb27f)
    • Installed by $ curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-jammy-td-agent4.sh | sh

How to Reproduce

  • Install fluent-plugin-systemd plugin: $ td-agent-gem install fluent-plugin-systemd
  • Add the following setting
<source>
  @type systemd
  tag debug
  path /var/log/journal
  read_from_head true
</source>
  • $ (sudo) adduser td-agent systemd-journal
  • $ (sudo) systemctl restart td-agent

Result

  • After reading one record, then the worker dies with SIGABRT.
2022-10-12 05:58:38 +0000 [info]: #0 fluentd worker is now running worker=0
2022-10-12 04:50:35.083947000 +0000 debug: {"SYSLOG_FACILITY":"3","SYSLOG_IDENTIFIER":"systemd-journald","_TRANSPORT":"driver","PRIORITY":"6","MESSAGE_ID":"f77379a8490b408bbe5f6940505a777b","MESSAGE":"Journal started","_PID":"57","_UID":"0","_GID":"0","_COMM":"systemd-journal","_EXE":"/usr/lib/systemd/systemd-journald","_CMDLINE":"/lib/systemd/systemd-journald","_CAP_EFFECTIVE":"25402800cf","_SELINUX_CONTEXT":"unconfined\n","_SYSTEMD_CGROUP":"/system.slice/systemd-journald.service","_SYSTEMD_UNIT":"systemd-journald.service","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_INVOCATION_ID":"ad56283776054be3859ad9b4e1f962d5","_BOOT_ID":"eb783cbf1e3c47c0a680a80b99e356d9","_MACHINE_ID":"6981994dead5402094f9195aec951d36","_HOSTNAME":"jammy-td-agetn"}
2022-10-12 05:58:39 +0000 [error]: Worker 0 finished unexpectedly with signal SIGABRT

ETC

This doesn't reproduce on Ubuntu 20.04.

daipom avatar Oct 12 '22 06:10 daipom

As @scrwr says, we can avoid this issue by commenting out the following line in /lib/systemd/system/td-agent.service

Environment=LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so

Then apply this.

$ (sudo) systemctl daemon-reload
$ (sudo) systemctl restart td-agent

However, is it correct to comment out this?

daipom avatar Oct 12 '22 06:10 daipom

However, is it correct to comment out this?

It is not recommended to edit directly /lib/systemd/system/td-agent.service. The correct way to change environment variables would be as follows for Ubuntu:

  • Edit /etc/default/td-agent and add the following line:
LD_PRELOAD=
  • Then restart td-agent: $ (sudo) systemctl restart td-agent

My concern here was the effect of omitting this environment variable, but it seems that if memory usage is not a problem, this environment variable can be omitted.

Thus, for now, this seems to be a workaround.

daipom avatar Oct 12 '22 07:10 daipom

having the same issue. workaround helped, but I wonder if there is any progress on a permanent fix?

mszabo avatar Mar 30 '23 12:03 mszabo

I think there is no progress. We still need this workaround for fluent-plugin-systemd in some environments.

@mszabo Could you share your environment information? Are you using Ubuntu?

daipom avatar Mar 31 '23 01:03 daipom

@daipom yes, issue surfaced when we started to migrate to the latest ubuntu LTS. (22.04).

mszabo avatar Apr 11 '23 16:04 mszabo

Thanks!

daipom avatar Apr 12 '23 01:04 daipom

Hello,

Migrated my app from RHEL 8.8 to RHEL 9.2 and started experiencing the same issue:

2023-10-05 11:06:29 +0200 [error]: Worker 0 exited unexpectedly with signal SIGABRT

The workaround with unsetting LD_PRELOAD var helped. Posting my env info in case it may help with the permanent fix.

  • OS: RHEL 9.2
  • kernel version 5.14.0-284.30.1
  • td-agent package version 4.5.1-1
  • ruby version 3.1.4p223 (bundled with td-agent)
  • fluent-plugin-systemd gem version 1.0.5
  • systemd-journal gem version 1.4.2

Thanks, Andrii

fesia avatar Oct 05 '23 10:10 fesia