syslog-ng icon indicating copy to clipboard operation
syslog-ng copied to clipboard

SIGBUS after journal rotation

Open amdrsantos opened this issue 2 years ago • 6 comments

syslog-ng

Version of syslog-ng

syslog-ng 4 (4.1.0)
Config version: 4.0
Installer-Version: 4.1.0
Revision: 4.1.0-1
Compile-Date: Feb 28 2023 15:33:51
Module-Directory: /usr/lib/syslog-ng/4.1
Module-Path: /usr/lib/syslog-ng/4.1
Include-Path: /usr/share/syslog-ng/include
Available-Modules: disk-buffer,regexp-parser,affile,correlation,rate-limit-filter,afsql,tags-parser,afmongodb,syslogformat,afuser,pseudofile,csvparser,linux-kmsg-format,afprog,metrics-probe,cryptofuncs,afsocket,system-source,confgen,kvformat,basicfuncs,pacctformat,cef,timestamp,sdjournal,hook-commands,appmodel,json-plugin
Enable-Debug: off
Enable-GProf: off
Enable-Memtrace: off
Enable-IPv6: on
Enable-Spoof-Source: on
Enable-TCP-Wrapper: on
Enable-Linux-Caps: on
Enable-Systemd: on

Service file

[Unit]
Description=System Logger Daemon
Documentation=man:syslog-ng(8)
After=network.target

[Service]
Type=notify
ExecStart=/usr/sbin/syslog-ng -F $SYSLOGNG_OPTS --caps "cap_net_bind_service,cap_net_broadcast,cap_net_raw,cap_dac_read_search,cap_chown,cap_fowner=p cap_dac_override,cap_syslog=ep"
ExecReload=/bin/kill -HUP $MAINPID
EnvironmentFile=-/etc/default/syslog-ng
EnvironmentFile=-/etc/sysconfig/syslog-ng
StandardOutput=journal
StandardError=journal
Restart=on-failure

[Install]
WantedBy=multi-user.target

Platform

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/"

Debug bundle

Not Aviable

Issue

Failure

2023-09-20T21:24:59.455535+0000 machine systemd[1]: Started [email protected] - Process Core Dump (PID 18522/UID 0).
2023-09-20T21:25:00.259365+0000 machine systemd-coredump[18523]: [🡕] Process 2984 (syslog-ng) of user 0 dumped core.

                                                                  Module libsystemd.so.0 from deb systemd-252.6-1.amd64
                                                                  Stack trace of thread 17642:
                                                                  #0  0x00007f5d61dd4f9e n/a (libsystemd.so.0 + 0x5ff9e)
                                                                  #1  0x00007f5d6147671e n/a (libsdjournal.so + 0x771e)
                                                                  #2  0x00007f5d61476ac9 n/a (libsdjournal.so + 0x7ac9)
                                                                  #3  0x00007f5d62345061 n/a (libsyslog-ng-4.1.so.0 + 0x58061)
                                                                  #4  0x00007f5d61e49692 n/a (libivykis.so.0 + 0x5692)
                                                                  #5  0x00007f5d61e48043 n/a (libivykis.so.0 + 0x4043)
                                                                  #6  0x00007f5d61e4dfe5 n/a (libivykis.so.0 + 0x9fe5)
                                                                  #7  0x00007f5d61e4a259 n/a (libivykis.so.0 + 0x6259)
                                                                  #8  0x00007f5d61e4b841 iv_main (libivykis.so.0 + 0x7841)
                                                                  #9  0x00007f5d61e49463 n/a (libivykis.so.0 + 0x5463)
                                                                  #10 0x00007f5d61e4cb75 n/a (libivykis.so.0 + 0x8b75)
                                                                  #11 0x00007f5d6204ffd4 n/a (libc.so.6 + 0x88fd4)
                                                                  #12 0x00007f5d620cf8d0 __clone (libc.so.6 + 0x1088d0)

                                                                  Stack trace of thread 2984:
                                                                  #0  0x00007f5d620cfcb6 epoll_wait (libc.so.6 + 0x108cb6)
                                                                  #1  0x00007f5d61e4ddc5 n/a (libivykis.so.0 + 0x9dc5)
                                                                  #2  0x00007f5d61e4a381 n/a (libivykis.so.0 + 0x6381)
                                                                  #3  0x00007f5d61e4b841 iv_main (libivykis.so.0 + 0x7841)
                                                                  #4  0x00007f5d62343e69 main_loop_run (libsyslog-ng-4.1.so.0 + 0x56e69)
                                                                  #5  0x000056528d39d62b main (syslog-ng + 0x262b)
                                                                  #6  0x00007f5d61fee18a n/a (libc.so.6 + 0x2718a)
                                                                  #7  0x00007f5d61fee245 __libc_start_main (libc.so.6 + 0x27245)
                                                                  #8  0x000056528d39d701 _start (syslog-ng + 0x2701)

                                                                  Stack trace of thread 17640:
                                                                  #0  0x00007f5d620cfcb6 epoll_wait (libc.so.6 + 0x108cb6)
                                                                  #1  0x00007f5d61e4ddc5 n/a (libivykis.so.0 + 0x9dc5)
                                                                  #2  0x00007f5d61e4a259 n/a (libivykis.so.0 + 0x6259)
                                                                  #3  0x00007f5d61e4b841 iv_main (libivykis.so.0 + 0x7841)
                                                                  #4  0x00007f5d61e49463 n/a (libivykis.so.0 + 0x5463)
                                                                  #5  0x00007f5d61e4cb75 n/a (libivykis.so.0 + 0x8b75)
                                                                  #6  0x00007f5d6204ffd4 n/a (libc.so.6 + 0x88fd4)
                                                                  #7  0x00007f5d620cf8d0 __clone (libc.so.6 + 0x1088d0)

                                                                  Stack trace of thread 17639:
                                                                  #0  0x00007f5d620cfcb6 epoll_wait (libc.so.6 + 0x108cb6)
                                                                  #1  0x00007f5d61e4ddc5 n/a (libivykis.so.0 + 0x9dc5)
                                                                  #2  0x00007f5d61e4a259 n/a (libivykis.so.0 + 0x6259)
                                                                  #3  0x00007f5d61e4b841 iv_main (libivykis.so.0 + 0x7841)
                                                                  #4  0x00007f5d61e49463 n/a (libivykis.so.0 + 0x5463)
                                                                  #5  0x00007f5d61e4cb75 n/a (libivykis.so.0 + 0x8b75)
                                                                  #6  0x00007f5d6204ffd4 n/a (libc.so.6 + 0x88fd4)
                                                                  #7  0x00007f5d620cf8d0 __clone (libc.so.6 + 0x1088d0)

                                                                  Stack trace of thread 18311:
                                                                  #0  0x00007f5d620cfcb6 epoll_wait (libc.so.6 + 0x108cb6)
                                                                  #1  0x00007f5d61e4ddc5 n/a (libivykis.so.0 + 0x9dc5)
                                                                  #2  0x00007f5d61e4a259 n/a (libivykis.so.0 + 0x6259)
                                                                  #3  0x00007f5d61e4b841 iv_main (libivykis.so.0 + 0x7841)
                                                                  #4  0x00007f5d62329ede n/a (libsyslog-ng-4.1.so.0 + 0x3cede)
                                                                  #5  0x00007f5d62344f01 n/a (libsyslog-ng-4.1.so.0 + 0x57f01)
                                                                  #6  0x00007f5d62226cfd n/a (libglib-2.0.so.0 + 0x7ecfd)
                                                                  #7  0x00007f5d6204ffd4 n/a (libc.so.6 + 0x88fd4)
                                                                  #8  0x00007f5d620cf8d0 __clone (libc.so.6 + 0x1088d0)

                                                                  Stack trace of thread 17641:
                                                                  #0  0x00007f5d620cfcb6 epoll_wait (libc.so.6 + 0x108cb6)
                                                                  #1  0x00007f5d61e4ddc5 n/a (libivykis.so.0 + 0x9dc5)
                                                                  #2  0x00007f5d61e4a259 n/a (libivykis.so.0 + 0x6259)
                                                                  #3  0x00007f5d61e4b841 iv_main (libivykis.so.0 + 0x7841)
                                                                  #4  0x00007f5d61e49463 n/a (libivykis.so.0 + 0x5463)
                                                                  #5  0x00007f5d61e4cb75 n/a (libivykis.so.0 + 0x8b75)
                                                                  #6  0x00007f5d6204ffd4 n/a (libc.so.6 + 0x88fd4)
                                                                  #7  0x00007f5d620cf8d0 __clone (libc.so.6 + 0x1088d0)
                                                                  ELF object binary architecture: AMD x86-64
2023-09-20T21:25:00.287037+0000 machine kernel: syslog-ng (2984) used greatest stack depth: 11544 bytes left
2023-09-20T21:25:00.303109+0000 machine systemd[1]: syslog-ng.service: Main process exited, code=killed, status=7/BUS
2023-09-20T21:25:00.303327+0000 machine systemd[1]: syslog-ng.service: Failed with result 'signal'.
2023-09-20T21:25:00.303642+0000 machine systemd[1]: syslog-ng.service: Consumed 3min 24.547s CPU time.

Steps to reproduce

I think it happening after the journal rotation:

2023-09-20T21:24:59.412233+0000 machine systemd-journald[1554]: Data hash table of /var/log/journal/397d4151bc50a273e5879e5064b7d611/system.journal has a fill level at 75.0 (51202 of 68266 items, 25165824 file size, 491 bytes per hash table item), suggesting rotation.
2023-09-20T21:24:59.412800+0000 machine systemd-journald[1554]: /var/log/journal/397d4151bc50a273e5879e5064b7d611/system.journal: Journal header limits reached or header out-of-date, rotating.

Configuration

In attachment GX-101569.syslog-ng.zip

Input and output logs (if possible)

<30>1 2023-09-20T21:24:47.412+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2941818"] Reloading syslog-ng.service - System Logger Daemon...
<30>1 2023-09-20T21:24:47.416+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2941821"] Reloaded syslog-ng.service - System Logger Daemon.

<46>1 2023-09-20T21:24:59.412+00:00 GX-208-xmm4-1-1 systemd-journald 1554 - [meta sequenceId="2943204"] Data hash table of /var/log/journal/397d4151bc50a273e5879e5064b7d611/system.journal has a fill level at 75.0 (51202 of 68266 items, 25165824 file size, 491 bytes per hash table item), suggesting rotation.
<46>1 2023-09-20T21:24:59.412+00:00 GX-208-xmm4-1-1 systemd-journald 1554 - [meta sequenceId="2943205"] /var/log/journal/397d4151bc50a273e5879e5064b7d611/system.journal: Journal header limits reached or header out-of-date, rotating.
<30>1 2023-09-20T21:24:59.452+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943348"] Created slice system-systemd\x2dcoredump.slice - Slice /system/systemd-coredump.
<30>1 2023-09-20T21:24:59.455+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943349"] Started [email protected] - Process Core Dump (PID 18522/UID 0).
<28>1 2023-09-20T21:25:00.303+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943352"] syslog-ng.service: Main process exited, code=killed, status=7/BUS
<28>1 2023-09-20T21:25:00.304+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943353"] syslog-ng.service: Failed with result 'signal'.
<30>1 2023-09-20T21:25:00.304+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943354"] syslog-ng.service: Consumed 3min 24.547s CPU time.
<30>1 2023-09-20T21:25:00.311+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943355"] [email protected]: Deactivated successfully.
<30>1 2023-09-20T21:25:00.523+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943356"] syslog-ng.service: Scheduled restart job, restart counter is at 1.
<30>1 2023-09-20T21:25:00.524+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943357"] Stopped syslog-ng.service - System Logger Daemon.
<30>1 2023-09-20T21:25:00.524+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943358"] syslog-ng.service: Consumed 3min 24.547s CPU time.
<30>1 2023-09-20T21:25:00.525+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943359"] Starting syslog-ng.service - System Logger Daemon...
<45>1 2023-09-20T21:25:00.662+00:00 GX-208-xmm4-1-1 syslog-ng 18560 - [meta sequenceId="2906730"] syslog-ng starting up; version='4.1.0'
<30>1 2023-09-20T21:25:03.956+00:00 GX-208-xmm4-1-1 systemd 1 - [meta sequenceId="2943744"] Started syslog-ng.service - System Logger Daemon.

amdrsantos avatar Sep 21 '23 21:09 amdrsantos

Still visible in version 4.4:

root@machine:/home# coredumpctl dump 24897
           PID: 24897 (syslog-ng)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 7 (BUS)
     Timestamp: Thu 2023-10-19 15:52:25 UTC (9min ago)
  Command Line: /usr/sbin/syslog-ng -F
    Executable: /usr/sbin/syslog-ng
 Control Group: /system.slice/syslog-ng.service
          Unit: syslog-ng.service
         Slice: system.slice
       Boot ID: ac13e43f9b6e41c89d2f4568c4337440
    Machine ID: 397d4151bc50a273e5879e5064b7d611
      Hostname: xmm4-1-1
       Storage: /var/lib/systemd/coredump/core.syslog-ng.0.ac13e43f9b6e41c89d2f4568c4337440.24897.1697730745000000.zst (present)
  Size on Disk: 1.2M
       Message: Process 24897 (syslog-ng) of user 0 dumped core.

                Module libsystemd.so.0 from deb systemd-252.6-1.amd64
                Stack trace of thread 27185:
                #0  0x00007f5ba0370f9e n/a (libsystemd.so.0 + 0x5ff9e)
                #1  0x00007f5b9fb3d86e n/a (libsdjournal.so + 0x786e)
                #2  0x00007f5b9fb3dc19 n/a (libsdjournal.so + 0x7c19)
                #3  0x00007f5ba0907db2 n/a (libsyslog-ng-4.4.so.0 + 0x5cdb2)
                #4  0x00007f5ba03e5692 n/a (libivykis.so.0 + 0x5692)
                #5  0x00007f5ba03e4043 n/a (libivykis.so.0 + 0x4043)
                #6  0x00007f5ba03e9fe5 n/a (libivykis.so.0 + 0x9fe5)
                #7  0x00007f5ba03e6259 n/a (libivykis.so.0 + 0x6259)
                #8  0x00007f5ba03e7841 iv_main (libivykis.so.0 + 0x7841)
                #9  0x00007f5ba03e5463 n/a (libivykis.so.0 + 0x5463)
                #10 0x00007f5ba03e8b75 n/a (libivykis.so.0 + 0x8b75)
                #11 0x00007f5ba060dfd4 start_thread (libc.so.6 + 0x88fd4)
                #12 0x00007f5ba068d820 __clone (libc.so.6 + 0x108820)

                Stack trace of thread 27183:
                #0  0x00007f5ba068dc06 epoll_wait (libc.so.6 + 0x108c06)
                #1  0x00007f5ba03e9dc5 n/a (libivykis.so.0 + 0x9dc5)
                #2  0x00007f5ba03e6259 n/a (libivykis.so.0 + 0x6259)
                #3  0x00007f5ba03e7841 iv_main (libivykis.so.0 + 0x7841)
                #4  0x00007f5ba08eb3ce n/a (libsyslog-ng-4.4.so.0 + 0x403ce)
                #5  0x00007f5ba0907c31 n/a (libsyslog-ng-4.4.so.0 + 0x5cc31)
                #6  0x00007f5ba07e4cfd n/a (libglib-2.0.so.0 + 0x7ecfd)
                #7  0x00007f5ba060dfd4 start_thread (libc.so.6 + 0x88fd4)
                #8  0x00007f5ba068d820 __clone (libc.so.6 + 0x108820)

                Stack trace of thread 24897:
                #0  0x00007f5ba068dc06 epoll_wait (libc.so.6 + 0x108c06)
                #1  0x00007f5ba03e9dc5 n/a (libivykis.so.0 + 0x9dc5)
                #2  0x00007f5ba03e6381 n/a (libivykis.so.0 + 0x6381)
                #3  0x00007f5ba03e7841 iv_main (libivykis.so.0 + 0x7841)
                #4  0x00007f5ba0906a89 main_loop_run (libsyslog-ng-4.4.so.0 + 0x5ba89)
                #5  0x0000560a3daad58a main (syslog-ng + 0x258a)
                #6  0x00007f5ba05ac18a __libc_start_call_main (libc.so.6 + 0x2718a)
                #7  0x00007f5ba05ac245 __libc_start_main_impl (libc.so.6 + 0x27245)
                #8  0x0000560a3daad6f1 _start (syslog-ng + 0x26f1)

                Stack trace of thread 27186:
                #0  0x00007f5ba068dc06 epoll_wait (libc.so.6 + 0x108c06)
                #1  0x00007f5ba03e9dc5 n/a (libivykis.so.0 + 0x9dc5)
                #2  0x00007f5ba03e6259 n/a (libivykis.so.0 + 0x6259)
                #3  0x00007f5ba03e7841 iv_main (libivykis.so.0 + 0x7841)
                #4  0x00007f5ba03e5463 n/a (libivykis.so.0 + 0x5463)
                #5  0x00007f5ba03e8b75 n/a (libivykis.so.0 + 0x8b75)
                #6  0x00007f5ba060dfd4 start_thread (libc.so.6 + 0x88fd4)
                #7  0x00007f5ba068d820 __clone (libc.so.6 + 0x108820)
                ELF object binary architecture: AMD x86-64

amdrsantos avatar Oct 20 '23 08:10 amdrsantos

SIGBUS indicates that the backing pages of a mmap()-ed region is gone. Normally, this should not happen with a normal file system, as if the file itself is removed, as long as there's a reference to it (e.g. mmap), it should not be removed in reality, the backing storage would remain to be allocated while the filename itself is removed from the containing directory.

What kind of filesytem hosts the systemd journal file? Is it a normal file system or a ramdisk, perhaps something else? We really can't do too much in this situation, at least if my assumptions are correct.

But the question remains, what do you have as backing storage for the journal files?

bazsi avatar Dec 10 '23 11:12 bazsi

This can be somewhat related to: https://github.com/systemd/systemd/issues/12042

As you can see, sometimes the journald will truncate a file, which might cause a SIGBUS in the reading side (e.g. syslog-ng)

bazsi avatar Dec 10 '23 12:12 bazsi

And here's the possible fix: https://github.com/systemd/systemd/issues/24320

It is fixed in systemd v255, I am not sure which one you are running. But the same bug affects rsyslog and filebeat

bazsi avatar Dec 10 '23 12:12 bazsi

The journal is in a type ext4 partition of an SSD. The machine is still using: systemd 252 (252.17-1~deb12u1).

Probably the issue that you mentioned can be the cause. I will let you know when we decide to upgrade.

amdrsantos avatar Dec 12 '23 11:12 amdrsantos

Thanks Alexeandra. Sorry it took this long for me to answer.

On Tue, Dec 12, 2023 at 12:39 PM Alexandre Santos @.***> wrote:

The journal is in a type ext4 partition of an SSD. The machine is still using: systemd 252 (252.17-1~deb12u1).

Probably the issue that you mentioned can be the cause. I will let you know when we decide to upgrade.

— Reply to this email directly, view it on GitHub https://github.com/syslog-ng/syslog-ng/issues/4645#issuecomment-1851870802, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFOK5SHDKJMELIHBIYWYCLYJA67JAVCNFSM6AAAAAA5CFBJZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJRHA3TAOBQGI . You are receiving this because you commented.Message ID: @.***>

bazsi avatar Dec 12 '23 12:12 bazsi