Fluent-bit crashes with a coredump when running on RHEL10
Bug Report
Describe the bug
Fluent-bit 4.0.x and 4.1.x crashes with a coredump when running on RHEL10.
The bug seems to be related to the systemd input plugin. When started, the agent works fine for a while before crashing with a coredump. When this happen, any subsequent attempts to start the agent will result in an immediate crash with another coredump.
If we delete all contents from storage.path (systemd.0/ and systemd.db), the agent will start without problems, and run for a while before crashing again with a coredump.
It seems to me that the systemd chunk file gets corrupted for some reason, and when this happens, the agent crashes.
This happens with packages (4.0.13, 4.1.0, 4.1.1) from the almalinux repo at packages.fluentbit.io on multiple servers. But I have compiled 4.1.0 and 4.1.1 from source to activate FLB_DEBUG and I get the same problem.
To Reproduce
- Journald logs related to the crash:
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: [2025/10/24 15:11:22] [engine] caught signal (SIGSEGV)
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #0 0x7f2e9f071608 in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #1 0x7f2ea0057b08 in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #2 0x7f2e9ffb4d32 in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #3 0x5c899e in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #4 0x55488e in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #5 0x577e2b in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #6 0x7f2e9f2bbb67 in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #7 0x7f2e9f32c6bb in ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #8 0xffffffffffffffff in ???() at ???:0
Oct 24 15:11:22 hostname.domain systemd-coredump[946015]: Process 939013 (fluent-bit) of user 0 terminated abnormally with signal 6/ABRT, processing...
Oct 24 15:11:22 hostname.domain systemd[1]: Started [email protected] - Process Core Dump (PID 946015/UID 0).
Oct 24 15:11:22 hostname.domain systemd-coredump[946016]: Removed old coredump core.fluent-bit.0.dfd7beb07d594c77bef0090bd555891f.834839.1761112301000000.zst.
Oct 24 15:11:22 hostname.domain systemd-coredump[946016]: [🡕] Process 939013 (fluent-bit) of user 0 dumped core.
Module libzstd.so.1 from rpm zstd-1.5.5-9.el10.x86_64
Module libpcre2-8.so.0 from rpm pcre2-10.44-1.el10.3.x86_64
Module libcrypt.so.2 from rpm libxcrypt-4.4.36-10.el10.x86_64
Module libselinux.so.1 from rpm libselinux-3.8-2.el10_0.x86_64
Module libsasl2.so.3 from rpm cyrus-sasl-2.1.28-27.el10.x86_64
Module libevent-2.1.so.7 from rpm libevent-2.1.12-16.el10.x86_64
Module libkeyutils.so.1 from rpm keyutils-1.6.3-5.el10.x86_64
Module libkrb5support.so.0 from rpm krb5-1.21.3-8.el10_0.x86_64
Module libcom_err.so.2 from rpm e2fsprogs-1.47.1-3.el10.x86_64
Module libk5crypto.so.3 from rpm krb5-1.21.3-8.el10_0.x86_64
Module libkrb5.so.3 from rpm krb5-1.21.3-8.el10_0.x86_64
Module libgssapi_krb5.so.2 from rpm krb5-1.21.3-8.el10_0.x86_64
Module libz.so.1 from rpm zlib-ng-2.2.3-1.el10.x86_64
Module libcap.so.2 from rpm libcap-2.69-7.el10.x86_64
Module libcrypto.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
Module libssl.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
Module libsystemd.so.0 from rpm systemd-257-9.el10_0.1.x86_64
Module libyaml-0.so.2 from rpm libyaml-0.2.5-16.el10.x86_64
Stack trace of thread 939016:
#0 0x00007f2e9f2bd9dc __pthread_kill_implementation (libc.so.6 + 0x969dc)
#1 0x00007f2e9f267a96 raise (libc.so.6 + 0x40a96)
#2 0x00007f2e9f24f8fa abort (libc.so.6 + 0x288fa)
#3 0x00000000004b1498 n/a (n/a + 0x0)
#4 0x313a35312034322f n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Oct 24 15:11:22 hostname.domain systemd[1]: [email protected]: Deactivated successfully.
Oct 24 15:11:22 hostname.domain systemd[1]: [email protected]: Consumed 268ms CPU time, 106.8M memory peak.
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Main process exited, code=dumped, status=6/ABRT
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Failed with result 'core-dump'.
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Consumed 22.833s CPU time, 48.4M memory peak.
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Scheduled restart job, restart counter is at 1.
-
Trace logs from fluent-bit, before, during, after the crash: fluent-bit.log.txt
-
I have been able to generate this backtrace from the coredump generated by the subsequent attempts to start the agent :
(gdb) backtrace
#0 0x00007fd5df4bd9dc in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fd5df467a96 in raise () from /lib64/libc.so.6
#2 0x00007fd5df44f8fa in abort () from /lib64/libc.so.6
#3 0x00000000004ae47d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
#4 <signal handler called>
#5 0x00000000011d0e84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
#6 0x00007fd5df9ad609 in ZSTD_freeDCtx () from /lib64/libzstd.so.1
#7 0x00007fd5e01e5b09 in journal_file_data_payload.isra () from /lib64/libsystemd.so.0
#8 0x00007fd5e0142d33 in sd_journal_enumerate_data () from /lib64/libsystemd.so.0
#9 0x00000000007466ea in in_systemd_collect (ins=0x390d8d60, config=0x390a7490, in_context=0x7fd5d0001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:387
#10 0x0000000000746b07 in in_systemd_collect_archive (ins=0x390d8d60, config=0x390a7490, in_context=0x7fd5d0001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:512
#11 0x0000000000504fcd in input_collector_fd (fd=39, ins=0x390d8d60) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
#12 0x0000000000505b0a in engine_handle_event (fd=39, mask=1, ins=0x390d8d60, config=0x390a7490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
#13 input_thread (data=0x7fd5d801f4e0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
#14 0x000000000057ca77 in step_callback (data=0x7fd5d8024b70) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
#15 0x00007fd5df4bbb68 in start_thread () from /lib64/libc.so.6
#16 0x00007fd5df52c6bc in clone3 () from /lib64/libc.so.6
- Steps to reproduce the problem:
The first crash after the agent has been working without problems for a while is random, I have not been able to identify the reason. After the first crash (and when the chunk file probably gets corrupted) the crash will be reproducible if you used the attached chunk/db files under storage.path/systemd.0/ and storage.path/systemd.db
Expected behavior
The agent should not crash with a coredump.
Screenshots
Your Environment
- Version used:
fluent-bit 4.0.13, 4.1.0 and 4.1.1 - Configuration:
[SERVICE]
# Flush
# =====
# set an interval of seconds before to flush records to a destination
flush 10
# Daemon
# ======
# instruct Fluent Bit to run in foreground or background mode.
daemon Off
# Log_file
# ========
# Absolute path for an optional log file. By default all logs are
# redirected to the standard error interface (stderr).
log_file /var/log/fluent-bit/fluent-bit.log
# Log_Level
# =========
# Set the verbosity level of the service, values can be:
#
# - error
# - warning
# - info
# - debug
# - trace
#
# by default 'info' is set, that means it includes 'error' and 'warning'.
log_level trace
# Parsers File
# ============
# specify an optional 'Parsers' configuration file
parsers_file parsers.conf
# Plugins File
# ============
# specify an optional 'Plugins' configuration file to load external plugins.
plugins_file plugins.conf
# HTTP Server
# ===========
# Enable/Disable the built-in HTTP Server for metrics
http_server On
http_listen 127.0.0.1
http_port 2020
# Storage
# =======
# Fluent Bit can use memory and filesystem buffering based mechanisms
#
# - https://docs.fluentbit.io/manual/administration/buffering-and-storage
#
# storage metrics
# ---------------
# publish storage pipeline metrics in '/api/v1/storage'. The metrics are
# exported only if the 'http_server' option is enabled.
#
storage.metrics on
# storage.path
# ------------
# absolute file system path to store filesystem data buffers (chunks).
#
storage.path /var/lib/fluent-bit/storage
# storage.sync
# ------------
# configure the synchronization mode used to store the data into the
# filesystem. It can take the values normal or full.
#
storage.sync normal
# storage.checksum
# ----------------
# enable the data integrity check when writing and reading data from the
# filesystem. The storage layer uses the CRC32 algorithm.
#
storage.checksum off
# storage.backlog.mem_limit
# -------------------------
# if storage.path is set, Fluent Bit will look for data chunks that were
# not delivered and are still in the storage layer, these are called
# backlog data. This option configure a hint of maximum value of memory
# to use when processing these records.
#
storage.backlog.mem_limit 100M
# storage.max_chunks_up
# ---------------------
# If the input plugin has enabled filesystem storage type, this
# property sets the maximum number of chunks that can be up in
# memory. Use this setting to control memory usage when you enable
# storage.type filesystem.
#
storage.max_chunks_up 128
# storage.delete_irrecoverable_chunks
# -----------------------------------
# When enabled, irrecoverable chunks will be deleted during
# runtime, and any other irrecoverable chunk located in the
# configured storage path directory will be deleted when
# Fluent-Bit starts. Accepted values: 'Off, 'On.
#
storage.delete_irrecoverable_chunks on
# scheduler.base
# ---------------
# Set a base of exponential backoff in seconds.
scheduler.base 5
# scheduler.cap
# -------------
# Set a maximum retry time in seconds.
scheduler.cap 900
[INPUT]
Name systemd
Tag logs_5000_systemd
db /var/lib/fluent-bit/storage/systemd.db
db.Sync Normal
Mem_Buf_Limit 100MB
storage.type filesystem
storage.pause_on_chunks_overlimit on
Read_From_Tail On
Lowercase On
Threaded true
[FILTER]
Name modify
Match logs_5000_systemd
Add dataops.data_processor dataops-logs-systemd
Add event.module systemd
Add event.provider systemd
Add event.dataset systemd.journald
Add data_stream.namespace prod
Add data_stream.dataset systemd.journald
Add service.name linux-systemd
[FILTER]
Name nest
Match *
Operation nest
Wildcard dataops.*
Nest_under dataops
Remove_prefix dataops.
[FILTER]
Name nest
Match *
Operation nest
Wildcard event.*
Nest_under event
Remove_prefix event.
[FILTER]
Name nest
Match *
Operation nest
Wildcard data_stream.*
Nest_under data_stream
Remove_prefix data_stream.
[FILTER]
Name nest
Match *
Operation nest
Wildcard service.*
Nest_under service
Remove_prefix service.
[FILTER]
Name modify
Match *
Add agent.type fluent-bit
[FILTER]
Name sysinfo
Match *
Fluentbit_version_key agent.version
Os_name_key os.name
Os_version_key os.version
Kernel_version_key os.kernel
Hostname_key host.name
[FILTER]
Name nest
Match *
Operation nest
Wildcard agent.*
Nest_under agent
Remove_prefix agent.
[FILTER]
Name nest
Match *
Operation nest
Wildcard os.*
Nest_under os
Remove_prefix os.
[FILTER]
Name nest
Match *
Operation nest
Wildcard host.*
Wildcard os*
Nest_under host
Remove_prefix host.
[OUTPUT]
Name http
Match logs_5000_*
Host server-receiver.example.org
Port 5000
Format json
Workers 1
storage.total_limit_size 100M
Retry_Limit no_limits
tls On
tls.verify On
tls.ca_file /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
tls.crt_file /path/my.crt
tls.key_file /path/my.key
fluent-bit.conf.txt / fluent-bit-systemd.conf.txt
- Server type and version:
Linux 6.12.0-55.38.1.el10_0.x86_64 x86_64 GNU/Linux - Operating System and version:
Red Hat Enterprise Linux release 10.0 (Coughlan) - Filters and plugins:
systemd(input), modify, nest, sysinfo, http(output)
Additional context
- Fluent-bit systemd chunk file and systemd db after the crash: fluent-bit-storage_path_files.zip
@rafaelma thanks for providing the bug with useful info.
I have pushed a "potential fix" for this issue here: https://github.com/fluent/fluent-bit/pull/11073, would you please give it a try ?
Hello, thank you very much for your replay.
I have patched the 4.1.1 source code with your commit https://github.com/fluent/fluent-bit/pull/11073/commits/0bd94ddbbc01c88c8752bc1dc135517e2e1bed5a , compiled the source and started the agent.
The agent worked without problems for 30min and crashed again.
We wonder if maybe the systemd input plugin is having problems parsing multiline logs in RHEL10? These multiline logs are processed without problems in rhel7,8,9.
[2025/10/27 10:11:23] [engine] caught signal (SIGSEGV)
#0 0x11d0e84 in ZSTD_freeDDict() at lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
#1 0x7fb8582f3608 in ???() at ???:0
#2 0x7fb858acbb08 in ???() at ???:0
#3 0x7fb858a28d32 in ???() at ???:0
#4 0x7466f9 in in_systemd_collect() at plugins/in_systemd/systemd.c:397
#5 0x504fcc in input_collector_fd() at src/flb_input_thread.c:166
#6 0x505b09 in engine_handle_event() at src/flb_input_thread.c:181
#7 0x505b09 in input_thread() at src/flb_input_thread.c:420
#8 0x57ca76 in step_callback() at src/flb_worker.c:43
#9 0x7fb857ebbb67 in ???() at ???:0
#10 0x7fb857f2c6bb in ???() at ???:0
#11 0xffffffffffffffff in ???() at ???:0
Aborted (core dumped)
The last logs in journald before the crash are:
Oct 27 10:11:10 hostname.domain sshd[1136]: srclimit_penalise: ipv4: new 209.38.98.72/32 deferred penalty of 1 seconds for penalty: connections without attempting authentication
Oct 27 10:11:21 hostname.domain systemd[1]: Starting setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs...
Oct 27 10:11:21 hostname.domain systemd[1]: Started setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs.
Oct 27 10:11:21 hostname.domain systemd[1]: Started dbus-:[email protected].
Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
***** Plugin catchall_boolean (89.3 confidence) suggests ******************
If you want to allow nis to enabled
Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
Do
setsebool -P nis_enabled 1
***** Plugin catchall (11.6 confidence) suggests **************************
If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
# semodule -X 300 -i my-rhsmpackagepr.pp
Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
***** Plugin catchall_boolean (89.3 confidence) suggests ******************
If you want to allow nis to enabled
Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
Do
setsebool -P nis_enabled 1
***** Plugin catchall (11.6 confidence) suggests **************************
If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
# semodule -X 300 -i my-rhsmpackagepr.pp
And the last log sent via the output plugin is:
Oct 27 10:11:10 hostname.domain sshd[1136]: srclimit_penalise: ipv4: new 209.38.98.72/32 deferred penalty of 1 seconds for penalty: connections without attempting authentication
I have checked the other crashes and many of them have these multiline logs from selinux right before the crash
And these are the trace logs from fluent-bit right before the crash:
[2025/10/27 10:11:15.620741844] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 is connected
[2025/10/27 10:11:15.620753465] [debug] [http_client] not using http_proxy for header
[2025/10/27 10:11:15.620772172] [trace] [io coro=0x7fb8240303c0] [net_write] trying 161 bytes
[2025/10/27 10:11:15.620816831] [trace] [io coro=0x7fb8240303c0] [net_write] ret=161 total=161/161
[2025/10/27 10:11:15.620835143] [trace] [io coro=0x7fb8240303c0] [net_write] trying 5252 bytes
[2025/10/27 10:11:15.620869714] [trace] [io coro=0x7fb8240303c0] [net_write] ret=5252 total=5252/5252
[2025/10/27 10:11:15.620877740] [trace] [io coro=0x7fb8240303c0] [net_read] try up to 4095 bytes
[2025/10/27 10:11:15.621853366] [trace] [engine] resuming coroutine=0x7fb8240303c0
[2025/10/27 10:11:15.622054062] [trace] [engine] resuming coroutine=0x7fb8240303c0
[2025/10/27 10:11:15.624272333] [trace] [engine] resuming coroutine=0x7fb8240303c0
[2025/10/27 10:11:15.624377117] [trace] [io coro=0x7fb8240303c0] [net_read] ret=66
[2025/10/27 10:11:15.624393096] [ info] [output:http:http.0] receiver-server.example.org:5000, HTTP status=200
ok
[2025/10/27 10:11:15.624414336] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 is now available
[2025/10/27 10:11:15.624432065] [debug] [out flush] cb_destroy coro_id=60
[2025/10/27 10:11:15.624438564] [trace] [coro] destroy coroutine=0x7fb8240303c0 data=0x7fb8240303e0
[2025/10/27 10:11:15.624496917] [trace] [engine] [task event] task_id=0 out_id=0 return=OK
[2025/10/27 10:11:15.624519077] [debug] [task] destroy task=0x7fb850199c90 (task_id=0)
[2025/10/27 10:11:15.624528524] [trace] [1097] http.0 -> fs_chunks_size = 36864 mod=-36864 chunk=1146617-1761556270.352518021.flb
[2025/10/27 10:11:15.624532962] [debug] [input chunk] remove chunk 1146617-1761556270.352518021.flb with 36864 bytes from plugin http.0, the updated fs_chunks_size is 0 bytes
[2025/10/27 10:11:21.852650890] [trace] [565] http.0 -> fs_chunks_size = 0
[2025/10/27 10:11:21.852677361] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb required 916 bytes and 100000000 bytes left in plugin http.0
[2025/10/27 10:11:21.852779117] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 30 elements, output map size 37 elements
[2025/10/27 10:11:21.852818662] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 37, will be 37, nested map size will be 1
[2025/10/27 10:11:21.852848705] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 37, will be 35, nested map size will be 3
[2025/10/27 10:11:21.852882663] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 34, nested map size will be 2
[2025/10/27 10:11:21.852916944] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 34, will be 34, nested map size will be 1
[2025/10/27 10:11:21.852956203] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 34 elements, output map size 35 elements
[2025/10/27 10:11:21.853015772] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 39, nested map size will be 2
[2025/10/27 10:11:21.853053461] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 39, will be 37, nested map size will be 3
[2025/10/27 10:11:21.853088929] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 37, will be 36, nested map size will be 2
[2025/10/27 10:11:21.853125410] [trace] [input chunk] update output instances with new chunk size diff=4096, records=1, input=systemd.0
[2025/10/27 10:11:21.853135909] [trace] [2226] http.0 -> fs_chunks_size = 0 mod=4096 chunk=1146617-1761556281.852406233.flb
[2025/10/27 10:11:21.853144527] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb update plugin http.0 fs_chunks_size by 4096 bytes, the current fs_chunks_size is 4096 bytes
[2025/10/27 10:11:21.853156141] [trace] [565] http.0 -> fs_chunks_size = 4096
[2025/10/27 10:11:21.853161330] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb required 928 bytes and 99995904 bytes left in plugin http.0
[2025/10/27 10:11:21.853215372] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/27 10:11:21.853687959] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/27 10:11:21.853741919] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/27 10:11:21.853804211] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/27 10:11:21.853858033] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/27 10:11:21.853918417] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/27 10:11:21.853990560] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/27 10:11:21.854037867] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/27 10:11:21.854124235] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
[2025/10/27 10:11:22.352419099] [trace] [565] http.0 -> fs_chunks_size = 4096
[2025/10/27 10:11:22.352443855] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb required 946 bytes and 99995904 bytes left in plugin http.0
[2025/10/27 10:11:22.352514618] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/27 10:11:22.352543033] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/27 10:11:22.352570398] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/27 10:11:22.352594301] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/27 10:11:22.352617713] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/27 10:11:22.352646867] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/27 10:11:22.352688181] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/27 10:11:22.352711794] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/27 10:11:22.352738135] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
@rafaelma I did another check and found that I needed another restart of the cursor in the collector function, I have updated the branch replacing the old fix with a new one, would you please retry it ?, thanks
Hello Eduardo, thanks for the patch.
Same procedure as last time, and another crash after 1 hour running without problems, just after the same type of logs:
Oct 28 10:11:22 hostname.domain systemd[1]: Starting setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs...
Oct 28 10:11:22 hostname.domain systemd[1]: Started setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs.
Oct 28 10:11:22 hostname.domain systemd[1]: Started dbus-:[email protected].
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
***** Plugin catchall_boolean (89.3 confidence) suggests ******************
If you want to allow nis to enabled
Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
Do
setsebool -P nis_enabled 1
***** Plugin catchall (11.6 confidence) suggests **************************
If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
# semodule -X 300 -i my-rhsmpackagepr.pp
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
***** Plugin catchall_boolean (89.3 confidence) suggests ******************
If you want to allow nis to enabled
Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
Do
setsebool -P nis_enabled 1
***** Plugin catchall (11.6 confidence) suggests **************************
If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
# semodule -X 300 -i my-rhsmpackagepr.pp
Oct 28 10:11:23 hostname.domain systemd-coredump[1221235]: [🡕] Process 1218463 (fluent-bit) of user 0 dumped core.
Module libzstd.so.1 from rpm zstd-1.5.5-9.el10.x86_64
Module libz.so.1 from rpm zlib-ng-2.2.3-1.el10.x86_64
Module libcap.so.2 from rpm libcap-2.69-7.el10.x86_64
Module libcrypto.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
Module libssl.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
Module libsystemd.so.0 from rpm systemd-257-9.el10_0.1.x86_64
Module libyaml-0.so.2 from rpm libyaml-0.2.5-16.el10.x86_64
Stack trace of thread 1218466:
#0 0x00007fd7c0abd9dc __pthread_kill_implementation (libc.so.6 + 0x969dc)
#1 0x00007fd7c0a67a96 raise (libc.so.6 + 0x40a96)
#2 0x00007fd7c0a4f8fa abort (libc.so.6 + 0x288fa)
#3 0x00000000004ae47d n/a (n/a + 0x0)
#4 0x00007fd7c0a67b40 __restore_rt (libc.so.6 + 0x40b40)
#5 0x00000000011d0ec4 n/a (n/a + 0x0)
#6 0x00007fd7c0fad609 ZSTD_freeDCtx (libzstd.so.1 + 0x5e609)
#7 0x00007fd7c1789b09 journal_file_data_payload.isra.0 (libsystemd.so.0 + 0xd1b09)
#8 0x00007fd7c16e6d33 sd_journal_enumerate_data (libsystemd.so.0 + 0x2ed33)
#9 0x000000000074670a n/a (n/a + 0x0)
#10 0x0000000000504fcd n/a (n/a + 0x0)
#11 0x0000000000505b0a n/a (n/a + 0x0)
#12 0x000000000057ca77 n/a (n/a + 0x0)
#13 0x00007fd7c0abbb68 start_thread (libc.so.6 + 0x94b68)
#14 0x00007fd7c0b2c6bc __clone3 (libc.so.6 + 0x1056bc)
Stack trace of thread 1218465:
#0 0x00007fd7c0b2caf6 epoll_wait (libc.so.6 + 0x105af6)
#1 0x00000000018757ea n/a (n/a + 0x0)
#2 0x0000000001875c14 n/a (n/a + 0x0)
#3 0x00000000004d01a4 n/a (n/a + 0x0)
#4 0x000000000057ca77 n/a (n/a + 0x0)
#5 0x00007fd7c0abbb68 start_thread (libc.so.6 + 0x94b68)
#6 0x00007fd7c0b2c6bc __clone3 (libc.so.6 + 0x1056bc)
Stack trace of thread 1218463:
#0 0x00007fd7c0af6945 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0xcf945)
#1 0x00007fd7c0b022e7 __nanosleep (libc.so.6 + 0xdb2e7)
#2 0x00007fd7c0b1451c sleep (libc.so.6 + 0xed51c)
#3 0x00000000004afb40 n/a (n/a + 0x0)
#4 0x00000000005b9acf n/a (n/a + 0x0)
#5 0x00000000004afdcb n/a (n/a + 0x0)
#6 0x00000000004afded n/a (n/a + 0x0)
#7 0x00007fd7c0a5130e __libc_start_call_main (libc.so.6 + 0x2a30e)
#8 0x00007fd7c0a513c9 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a3c9)
#9 0x00000000004a85f5 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
I have installed some debug packages and I think the backtrace output from the coredump is more complete now. I hope you will get more out of it.
Maybe is not important but at thread 3 there are 2 errors of this type: error: Cannot access memory at address
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7fd7bb7fe6c0 (LWP 1218466) __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
2 Thread 0x7fd7bbfff6c0 (LWP 1218465) 0x00007fd7c0b2caf6 in epoll_wait (epfd=11, events=0x7fd7bc0061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
3 Thread 0x7fd7c14cc7c0 (LWP 1218463) 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
4 Thread 0x7fd79dbc36c0 (LWP 1218467) 0x00007fd7c0b2caf6 in epoll_wait (epfd=75, events=0x7fd7bc176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
5 Thread 0x7fd79d3c26c0 (LWP 1218468) 0x00007fd7c0b2caf6 in epoll_wait (epfd=92, events=0x7fd7bc17cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
6 Thread 0x7fd79323d6c0 (LWP 1218470) 0x00007fd7c0b2caf6 in epoll_wait (epfd=97, events=0x7fd780000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
7 Thread 0x7fd7c0a266c0 (LWP 1218464) 0x00007fd7c0b2caf6 in epoll_wait (epfd=8, events=0x7fd7bc000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
8 Thread 0x7fd793a3e6c0 (LWP 1218469) 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
(gdb) backtrace full
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 [0/1804]
tid = <optimized out>
ret = 0
pd = <optimized out>
old_mask = {__val = {0}}
ret = <optimized out>
#1 0x00007fd7c0abda43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.
#2 0x00007fd7c0a67a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
ret = <optimized out>
#3 0x00007fd7c0a4f8fa in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 773, 0, 140564415133024, 140564540369648, 140564540365632, 4933032, 0, 140564641116160, 4899469, 4899394, 32175824}}, sa_flags = 0, sa_restorer = 0x7fd7b4083560}
#4 0x00000000004ae47d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
cf_opts = 0x0
#5 <signal handler called>
No locals.
#6 0x00000000011d0ec4 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
cMem = {customAlloc = 0x0, customFree = 0x1, opaque = 0x7fd7bb7f2af0}
#7 0x00007fd7c0fad609 in ZSTD_clearDict (dctx=0x7fd7b4083560) at .//decompress/zstd_decompress.c:315
No locals.
#8 ZSTD_freeDCtx (dctx=0x7fd7b4083560) at .//decompress/zstd_decompress.c:326
cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9 0x00007fd7c1789b09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
__func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x7fd7a5f06908, src_size=420, dst=<optimized out>, dst_size=0x7fd7bb7f2978) at ../src/basic/compress.c:451
k = 0
size = 773
r = <optimized out>
dctx = 0x7fd7b4083560
input = {src = 0x7fd7a5f06908, size = 420, pos = 420}
output = {dst = 0x7fd7b4043550, size = 262152, pos = 773}
__func__ = <optimized out>
size = <optimized out>
r = <optimized out>
dctx = <optimized out>
input = <optimized out>
output = <optimized out>
k = <optimized out>
_found = <optimized out>
__assert_in_set = <optimized out>
__unique_prefix_A18 = <optimized out>
__unique_prefix_B19 = <optimized out>
_level = <optimized out>
_e = <optimized out>
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x7fd7b4016d90, payload=0x7fd7a5f06908 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fd7bb7f2af0, ret_size=0x7fd7bb7f2ae8) at ../src/libsystemd/sd-journal/journal-file.c:1947
rsize = 773
r = <optimized out>
__func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x7fd7b4016d90, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fd7bb7f2af0, ret_size=0x7fd7bb7f2ae8, data_threshold=<optimized out>) at ../src/libsystemd/sd-journal/journal-file.c:2009
size = 420
c = <optimized out>
r = <optimized out>
__func__ = <optimized out>
#14 0x00007fd7c16e6d33 in sd_journal_enumerate_data (j=0x7fd7b4001140, data=0x7fd7bb7f2b70, size=0x7fd7bb7f4b98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
_e = <optimized out>
p = <optimized out>
d = 0x0
l = 0
_error = <optimized out>
_level = <optimized out>
n = <optimized out>
f = 0x7fd7b4016d90
o = 0x7fd7a6160ec0
r = <optimized out>
__func__ = "sd_journal_enumerate_data"
#15 0x000000000074670a in in_systemd_collect (ins=0x25d30210, config=0x25d14490, in_context=0x7fd7b4001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
ret = 0
ret_j = 1
entries = 16
skip_entries = 0
rows = 1
sec = 1761642683
nsec = 256042000
usec = 1761642683256042
length = 37
key = 0x7fd7a5f06700 "_SYSTEMD_UNIT=setroubleshootd.service"
cursor = 0x0
tag = 0x25d2fff0 "uio_logs_5000_systemd"
new_tag = "\260L\177\273\327\177\000\000\257\026M\000\000\000\000\000\320;\177\273\327\177\000\000\210}\210\001\000\000\000\000\220z\210\001\000\000\000\000\227\003\000\000\005\000\000\000\307\343\207\001\000\000\000\000(\000\000\0000\000\000\000\300L\177\273\327\177\000\000\000L\177\273\327\177\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/28 10:11:23.102410123\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
tag_len = 21
last_tag_len = 21
data = 0x7fd7a5f06700
ctx = 0x7fd7b4001090
tm = {tm = {tv_sec = 1761642683, tv_nsec = 256042000}}
kvlist = 0x7fd7b4004830
#16 0x0000000000504fcd in input_collector_fd (fd=91, ins=0x25d30210) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
head = 0x7fd7b4007d48
collector = 0x7fd7b4007cc0
input_coro = 0x7fd7bb7f4d00
config = 0x25d14490
#17 0x0000000000505b0a in engine_handle_event (fd=91, mask=1, ins=0x25d30210, config=0x25d14490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
ret = 0
#18 input_thread (data=0x7fd7bc01f360) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
__flb_event_priority_live_foreach_iter = 0
__flb_event_priority_live_foreach_n_events = 1
ret = 0
thread_id = 0
tmp = "flb-in-systemd.0-w0\000\000\000\000\000\242\227\022\000\000\000\000\000\300\346\177\273\327\177\000\000Gɡ\300\327\177", '\000' <repeats 17 times>
instance_exit = 0
event = 0x7fd7b4007cc0
ins = 0x25d30210
evl_bktq = 0x7fd7b4007bf0
thi = 0x7fd7bc01f360
p = 0x25d21560
sched = 0x7fd7b4000b70
dns_ctx = {lookups = {prev = 0x7fd7bb7f4d30, next = 0x7fd7bb7f4d30}, lookups_drop = {prev = 0x7fd7bb7f4d40, next = 0x7fd7bb7f4d40}}
notification = 0x7fd7c0a87e5e <__GI___snprintf+158>
#19 0x000000000057ca77 in step_callback (data=0x7fd7bc0249f0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7fd7bc0249f0
#20 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 2, 140564626524224, 140564626524487, 2287494562463486489, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#21 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fd7bbfff6c0 (LWP 1218465))]
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=11, events=0x7fd7bc0061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=11, events=0x7fd7bc0061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc006330, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7fd7bc006180
ret = 0
#2 0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc006330) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x00000000004d01a4 in log_worker_collector (data=0x7fd7bc006090) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_log.c:166
__i = 1
__ctx = 0x7fd7bc006180
run = 1
event = 0x0
log = 0x7fd7bc006090
signal_value = 2
#4 0x000000000057ca77 in step_callback (data=0x7fd7bc009fa0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7fd7bc009fa0
#5 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 0, 140564626524656, 140564626524919, 2287495662511985177, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fd7c14cc7c0 (LWP 1218463))]
#0 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
sc_cancel_oldtype = 0
sc_ret = <optimized out>
r = <optimized out>
#1 0x00007fd7c0b022e7 in __GI___nanosleep (req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
ret = <optimized out>
#2 0x00007fd7c0b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
save_errno = 22
max = 4294967295
ts = {tv_sec = 0, tv_nsec = 102016488}
#3 0x00000000004afb40 in flb_main_run (argc=3, argv=0x7ffeda8a92e8) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1469
opt = -1
ret = 0
json = 0x21000 <error: Cannot access memory at address 0x21000>
last_plugin = -1
cfg_file = 0x25d19c60 "\031]\002"
cf = 0x25d14df0
tmp = 0x25d14df0
service = 0x25d143b0
s = 0x5b9a46 <flb_supervisor_requested+213>
section = 0x25d143b0
cf_opts = 0x25d142b0
group = 0x862da3ad06103d00
supervisor_reload_notified = 0
trace_input = 0x0
trace_output = 0x25d14440 "stdout"
trace_props = 0x0
long_opts = {{name = 0x187c3b8 "storage_path", has_arg = 1, flag = 0x0, val = 98}, {name = 0x187c3c5 "config", has_arg = 1, flag = 0x0, val = 99}, {name = 0x187c132 "daemon", has_arg = 0, flag = 0x0, val = 100}, {name = 0x187c3cc "dry-run", has_arg = 0, flag = 0x0, val = 68}, {name = 0x187c139 "flush", has_arg = 1, flag = 0x0, val = 102}, {name = 0x187c3d4 "http",
has_arg = 0, flag = 0x0, val = 72}, {name = 0x187c3d9 "supervisor", has_arg = 0, flag = 0x0, val = 1029}, {name = 0x187c16a "log_file", has_arg = 1, flag = 0x0, val = 108}, {name = 0x187c3e4 "port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x187c13f "custom", has_arg = 1, flag = 0x0, val = 67}, {name = 0x187c0fe "input", has_arg = 1, flag = 0x0, val = 105}, {
name = 0x187c159 "processor", has_arg = 1, flag = 0x0, val = 114}, {name = 0x187c163 "filter", has_arg = 1, flag = 0x0, val = 70}, {name = 0x187c104 "output", has_arg = 1, flag = 0x0, val = 111}, {name = 0x187c146 "match", has_arg = 1, flag = 0x0, val = 109}, {name = 0x187c3e9 "parser", has_arg = 1, flag = 0x0, val = 82}, {name = 0x187c3f0 "prop", has_arg = 1,
flag = 0x0, val = 112}, {name = 0x187c3f5 "plugin", has_arg = 1, flag = 0x0, val = 101}, {name = 0x187c173 "tag", has_arg = 1, flag = 0x0, val = 116}, {name = 0x187c3fc "sp-task", has_arg = 1, flag = 0x0, val = 84}, {name = 0x187c404 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0x187c40c "verbose", has_arg = 0, flag = 0x0, val = 118}, {
name = 0x187c414 "workdir", has_arg = 1, flag = 0x0, val = 119}, {name = 0x187c41c "quiet", has_arg = 0, flag = 0x0, val = 113}, {name = 0x187c422 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x187c427 "help-json", has_arg = 0, flag = 0x0, val = 74}, {name = 0x187c431 "coro_stack_size", has_arg = 1, flag = 0x0, val = 115}, {name = 0x187c441 "sosreport",
has_arg = 0, flag = 0x0, val = 83}, {name = 0x187c177 "http_server", has_arg = 0, flag = 0x0, val = 72}, {name = 0x187c44b "http_listen", has_arg = 1, flag = 0x0, val = 76}, {name = 0x187c457 "http_port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x187c461 "enable-hot-reload", has_arg = 0, flag = 0x0, val = 89}, {name = 0x187c473 "enable-chunk-trace",
has_arg = 0, flag = 0x0, val = 90}, {name = 0x187c486 "trace", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x187c48c "trace-input", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x187c498 "trace-output", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x187c4a5 "trace-output-property", has_arg = 1, flag = 0x0, val = 1028}, {
name = 0x187c4c0 "disable-thread-safety-on-hot-reload", has_arg = 0, flag = 0x0, val = 87}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
#4 0x00000000005b9acf in flb_supervisor_run (argc=3, argv=0x7ffeda8a92e8, entry=0x4aed5a <flb_main_run>) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_supervisor.c:626
clean_argv = 0x11ff
clean_argc = 32727
env_child = 0x6c6f6f705f68652e <error: Cannot access memory at address 0x6c6f6f705f68652e>
ret = -1
#5 0x00000000004afdcb in flb_main (argc=3, argv=0x7ffeda8a92e8) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1564
No locals.
#6 0x00000000004afded in main (argc=3, argv=0x7ffeda8a92e8) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1572
No locals.
gdb) thread 4
[Switching to thread 4 (Thread 0x7fd79dbc36c0 (LWP 1218467))]
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=75, events=0x7fd7bc176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=75, events=0x7fd7bc176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc176640, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7fd7bc176310
ret = 0
#2 0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc176640) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x000000000051bd74 in output_thread (data=0x7fd7bc176050) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_output_thread.c:257
__flb_event_priority_live_foreach_iter = 1
__flb_event_priority_live_foreach_n_events = 0
n = 8
ret = 0
running = 1
stopping = 0
thread_id = 0
tmp = "flb-out-http.0-w0", '\000' <repeats 32 times>, "1218467\000=\020\006\255\243-\206"
event_local = {fd = 89, type = 65536, mask = 1, status = 2 '\002', data = 0x0, handler = 0x0, _head = {prev = 0x0, next = 0x0}, _priority_head = {prev = 0x0, next = 0x0}, priority = 6 '\006'}
event = 0x0
sched = 0x7fd78c000e80
task = 0x7fd7bc1949d0
u_conn = 0x7fd78c0205c0
ins = 0x25d360f0
out_flush = 0x7fd78c018cd0
th_ins = 0x7fd7bc176050
params = 0x0
sched_params = 0x0
dns_ctx = {lookups = {prev = 0x7fd79dbb9b70, next = 0x7fd79dbb9b70}, lookups_drop = {prev = 0x7fd79dbb9b80, next = 0x7fd79dbb9b80}}
notification = 0x0
#4 0x000000000057ca77 in step_callback (data=0x7fd7bc176660) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7fd7bc176660
#5 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 0, 140564626524272, 140564626524535, 2287439001155932697, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fd79d3c26c0 (LWP 1218468))]
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=92, events=0x7fd7bc17cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=92, events=0x7fd7bc17cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc129310, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7fd7bc0a6460
ret = 0
#2 0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc129310) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x000000000185be78 in mk_lib_worker (data=0x7fd7bc0ada10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_lib.c:154
fd = 824195691
bytes = 1667331187
val = 2334097595223798896
server = 0x7fd7bc17cbb0
event = 0x0
ctx = 0x7fd7bc0ada10
__i = 32727
__ctx = 0x383634383132
#4 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 0, 140564626524704, 140564626524967, 2287437901107434009, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#5 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
gdb) thread 6
[Switching to thread 6 (Thread 0x7fd79323d6c0 (LWP 1218470))]
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=97, events=0x7fd780000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=97, events=0x7fd780000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd780001b10, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7fd780000ee0
ret = 0
#2 0x0000000001875c14 in mk_event_wait (loop=0x7fd780001b10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x000000000186e408 in mk_server_worker_loop (server=0x7fd7bc17cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_server.c:506
__i = 1
__ctx = 0x7fd780000ee0
ret = 0
timeout_fd = 103
val = 1
event = 0x0
evl = 0x7fd780001b10
list = 0x7fd7800063e0
head = 0x7fd7800063e0
conn = 0x7fd793233da0
sched = 0x7fd784000b90
listener = 0x7fd780006400
server_timeout = 0x7fd78000e530
__i = 0
__ctx = 0x7fd780000ee0
#4 0x00000000018644b1 in mk_sched_launch_worker_loop (data=0x7fd784001390) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_scheduler.c:417
ret = 0
wid = 0
len = 13
thread_name = 0x7fd780006390 "VWz}\320\177"
head = 0x7fd7bc17cdf0
wcb = 0x7fd7bc0ee940
sched = 0x7fd784000b90
notif = 0x7fd780006340
thinfo = 0x7fd784001390
ctx = 0x7fd784000b70
server = 0x7fd7bc17cbb0
#5 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 2, 140564032621376, 140564032621639, 2287407049820475929, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 7
[Switching to thread 7 (Thread 0x7fd7c0a266c0 (LWP 1218464))]
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=8, events=0x7fd7bc000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007fd7c0b2caf6 in epoll_wait (epfd=8, events=0x7fd7bc000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc0017a0, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7fd7bc000b70
ret = 0
#2 0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc0017a0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x0000000000549d30 in flb_engine_start (config=0x25d14490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_engine.c:999
__flb_event_priority_live_foreach_iter = 3
__flb_event_priority_live_foreach_n_events = 0
ret = 0
tasks = 0
fs_chunks = 0
mem_chunks = 0
ts = 0
tmp = "24.0K\000\000\000\000\000\000\000\000\000\000"
rb_flush_flag = 0
t_flush = {tm = {tv_sec = 10, tv_nsec = 0}}
event = 0x0
evl = 0x7fd7bc0017a0
evl_bktq = 0x7fd7bc005fd0
sched = 0x7fd7bc01b610
dns_ctx = {lookups = {prev = 0x7fd7c0a1cd30, next = 0x7fd7c0a1cd30}, lookups_drop = {prev = 0x7fd7c0a1cd40, next = 0x7fd7c0a1cd40}}
notification = 0x0
rb_ms = 250
rb_env = 0x0
#4 0x00000000004cb8da in flb_lib_worker (data=0x25d14460) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_lib.c:835
ret = -2043829331
ctx = 0x25d14460
config = 0x25d14490
#5 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 22, 140732564934064, 140732564934327, 2287309170200154649, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 8
[Switching to thread 8 (Thread 0x7fd793a3e6c0 (LWP 1218469))]
#0 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
sc_cancel_oldtype = 2
sc_ret = <optimized out>
r = <optimized out>
#1 0x00007fd7c0b022e7 in __GI___nanosleep (req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
ret = <optimized out>
#2 0x00007fd7c0b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
save_errno = 0
max = 4294967295
ts = {tv_sec = 0, tv_nsec = 120694212}
#3 0x0000000001871f97 in mk_clock_worker_init (data=0x7fd7bc17cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_clock.c:124
cur_time = 1761642682
server = 0x7fd7bc17cbb0
#4 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 2, 140564032621424, 140564032621687, 2287408147721490969, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#5 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
And these are the trace logs from fluent-bit right before the crash:
[2025/10/28 10:11:22.602510368] [trace] [task 0x7fd7bc1949d0] created (id=0)
[2025/10/28 10:11:22.602601222] [trace] [upstream] get new connection for receiver-server.example.org:5000, net setup:
net.connect_timeout = 10 seconds
net.source_address = any
net.keepalive = enabled
net.keepalive_idle_timeout = 30 seconds
net.max_worker_connections = 0
[2025/10/28 10:11:22.602518892] [debug] [task] created task=0x7fd7bc1949d0 id=0 OK
[2025/10/28 10:11:22.602611674] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 has been assigned (recycled)
[2025/10/28 10:11:22.602528676] [debug] [output:http:http.0] task_id=0 assigned to thread #0
[2025/10/28 10:11:22.602619261] [debug] [http_client] not using http_proxy for header
[2025/10/28 10:11:22.602630663] [trace] [io coro=0x7fd78c02c660] [net_write] trying 161 bytes
[2025/10/28 10:11:22.602728194] [trace] [io coro=0x7fd78c02c660] [net_write] ret=161 total=161/161
[2025/10/28 10:11:22.602733808] [trace] [io coro=0x7fd78c02c660] [net_write] trying 1464 bytes
[2025/10/28 10:11:22.602752129] [trace] [io coro=0x7fd78c02c660] [net_write] ret=1464 total=1464/1464
[2025/10/28 10:11:22.602757296] [trace] [io coro=0x7fd78c02c660] [net_read] try up to 4095 bytes
[2025/10/28 10:11:22.603752273] [trace] [engine] resuming coroutine=0x7fd78c02c660
[2025/10/28 10:11:22.603982679] [trace] [io coro=0x7fd78c02c660] [net_read] ret=66
[2025/10/28 10:11:22.604189099] [ info] [output:http:http.0] receiver-server.example.org:5000, HTTP status=200
ok
[2025/10/28 10:11:22.604246406] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 is now available
[2025/10/28 10:11:22.604315277] [debug] [out flush] cb_destroy coro_id=213
[2025/10/28 10:11:22.604716134] [trace] [coro] destroy coroutine=0x7fd78c02c660 data=0x7fd78c02c680
[2025/10/28 10:11:22.604803526] [trace] [engine] [task event] task_id=0 out_id=0 return=OK
[2025/10/28 10:11:22.604822432] [debug] [task] destroy task=0x7fd7bc1949d0 (task_id=0)
[2025/10/28 10:11:22.604832772] [trace] [1097] http.0 -> fs_chunks_size = 4096 mod=-4096 chunk=1218463-1761642682.352458583.flb
[2025/10/28 10:11:22.604838770] [debug] [input chunk] remove chunk 1218463-1761642682.352458583.flb with 4096 bytes from plugin http.0, the updated fs_chunks_size is 0 bytes
[2025/10/28 10:11:22.852617555] [trace] [565] http.0 -> fs_chunks_size = 0
[2025/10/28 10:11:22.852642469] [trace] [input chunk] chunk 1218463-1761642682.852413803.flb required 1874 bytes and 100000000 bytes left in plugin http.0
[2025/10/28 10:11:22.852733321] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/28 10:11:22.852780064] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/28 10:11:22.852805968] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/28 10:11:22.852830711] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/28 10:11:22.852862776] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/28 10:11:22.852880267] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/28 10:11:22.852906761] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/28 10:11:22.852924420] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/28 10:11:22.852951223] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/28 10:11:22.852973729] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/28 10:11:22.853006485] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/28 10:11:22.853023176] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/28 10:11:22.853085179] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/28 10:11:22.853102940] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/28 10:11:22.853130298] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/28 10:11:22.853150292] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/28 10:11:22.853180785] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
[2025/10/28 10:11:22.853199879] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
[2025/10/28 10:11:22.853229995] [trace] [input chunk] update output instances with new chunk size diff=4096, records=2, input=systemd.0
[2025/10/28 10:11:22.853236325] [trace] [2226] http.0 -> fs_chunks_size = 0 mod=4096 chunk=1218463-1761642682.852413803.flb
[2025/10/28 10:11:22.853240997] [trace] [input chunk] chunk 1218463-1761642682.852413803.flb update plugin http.0 fs_chunks_size by 4096 bytes, the current fs_chunks_size is 4096 bytes
It seems to be related to having a bundled version of zstd, which conflicts when the external library uses the system’s zstd.
@rafaelma I have added another change to the PR/branch, please give it a try, thanks again for your help and patience on this.
Hello @edsiper, I am sorry for bringing bad news once again. We experience a new core dump crash with the new patch when these multiline logs from selinux are generated. The positive aspect is that we have identified when these logs are created, allowing us to provoke a crash at will, rather than waiting for one to occur unexpectedly.
Here you have the new backtraces. Tell me if you need some other information. Thank you very much for looking at this.
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f043f7fe6c0 (LWP 1298275) __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
2 Thread 0x7f0444a266c0 (LWP 1298273) 0x00007f0444b2caf6 in epoll_wait (epfd=8, events=0x7f0440000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
3 Thread 0x7f043ffff6c0 (LWP 1298274) 0x00007f0444b2caf6 in epoll_wait (epfd=11, events=0x7f04400061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
4 Thread 0x7f0415c1d6c0 (LWP 1298279) 0x00007f0444b2caf6 in epoll_wait (epfd=97, events=0x7f0404000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
5 Thread 0x7f04455157c0 (LWP 1298272) 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
6 Thread 0x7f041dc406c0 (LWP 1298276) 0x00007f0444b2caf6 in epoll_wait (epfd=75, events=0x7f0440176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
7 Thread 0x7f041d43f6c0 (LWP 1298277) 0x00007f0444b2caf6 in epoll_wait (epfd=91, events=0x7f044017cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
8 Thread 0x7f041641e6c0 (LWP 1298278) 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
(gdb) backtrace full
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
tid = <optimized out>
ret = 0
pd = <optimized out>
old_mask = {__val = {0}}
ret = <optimized out>
#1 0x00007f0444abda43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.
#2 0x00007f0444a67a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
ret = <optimized out>
#3 0x00007f0444a4f8fa in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 773, 0, 139656096692832, 139656221895408, 139656221891392, 4912552, 0, 139656322940928, 4878989, 4878914, 32155344}}, sa_flags = 0, sa_restorer = 0x7f043808ba60}
#4 0x00000000004a947d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
cf_opts = 0x0
#5 <signal handler called> No locals.
#6 0x00000000011cbe84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
cMem = {customAlloc = 0x0, customFree = 0x1, opaque = 0x7f043f7f2af0}
#7 0x00007f0444fad609 in ZSTD_clearDict (dctx=0x7f043808ba60) at .//decompress/zstd_decompress.c:315
No locals.
#8 ZSTD_freeDCtx (dctx=0x7f043808ba60) at .//decompress/zstd_decompress.c:326
cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9 0x00007f04457d2b09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
__func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x7f0429f06908, src_size=420, dst=<optimized out>, dst_size=0x7f043f7f2978) at ../src/basic/compress.c:451
k = 0
size = 773
r = <optimized out>
dctx = 0x7f043808ba60
input = {src = 0x7f0429f06908, size = 420, pos = 420}
output = {dst = 0x7f043804ba50, size = 262152, pos = 773}
__func__ = <optimized out>
size = <optimized out>
r = <optimized out>
dctx = <optimized out>
input = <optimized out>
output = <optimized out>
k = <optimized out>
_found = <optimized out>
__assert_in_set = <optimized out>
__unique_prefix_A18 = <optimized out>
__unique_prefix_B19 = <optimized out>
_level = <optimized out>
_e = <optimized out>
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x7f0438016d90, payload=0x7f0429f06908 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7f043f7f2af0, ret_size=0x7f043f7f2ae8)
at ../src/libsystemd/sd-journal/journal-file.c:1947
rsize = 773
r = <optimized out>
__func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x7f0438016d90, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7f043f7f2af0, ret_size=0x7f043f7f2ae8, data_threshold=<optimized out>)
at ../src/libsystemd/sd-journal/journal-file.c:2009
size = 420
c = <optimized out>
r = <optimized out>
__func__ = <optimized out>
#14 0x00007f044572fd33 in sd_journal_enumerate_data (j=0x7f0438001140, data=0x7f043f7f2b70, size=0x7f043f7f4b98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
_e = <optimized out>
p = <optimized out>
d = 0x0
l = 0
_error = <optimized out>
_level = <optimized out>
n = <optimized out>
f = 0x7f0438016d90
o = 0x7f04259c7c08
r = <optimized out>
__func__ = "sd_journal_enumerate_data"
--Type <RET> for more, q to quit, c to continue without paging--c
#15 0x000000000074170a in in_systemd_collect (ins=0xe884210, config=0xe868490, in_context=0x7f0438001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
ret = 0
ret_j = 1
entries = 16
skip_entries = 0
rows = 1
sec = 1761733332
nsec = 688426000
usec = 1761733332688426
length = 37
key = 0x7f0429f06700 "_SYSTEMD_UNIT=setroubleshootd.service"
cursor = 0x0
tag = 0xe883ff0 "uio_logs_5000_systemd"
new_tag = "\260L\177?\004\177\000\000\257\306L\000\000\000\000\000\320;\177?\004\177\000\000\210-\210\001\000\000\000\000\220*\210\001\000\000\000\000\227\003\000\000\005\000\000\000Ǔ\207\001\000\000\000\000(\000\000\0000\000\000\000\300L\177?\004\177\000\000\000L\177?\004\
177\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/29 11:22:12.604911191\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
tag_len = 21
last_tag_len = 21
data = 0x7f0429f06700
ctx = 0x7f0438001090
tm = {tm = {tv_sec = 1761733332, tv_nsec = 688426000}}
kvlist = 0x7f04380092c0
#16 0x00000000004fffcd in input_collector_fd (fd=40, ins=0xe884210) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
head = 0x7f0438007c58
collector = 0x7f0438007bd0
input_coro = 0x7f043f7f4d00
config = 0xe868490
#17 0x0000000000500b0a in engine_handle_event (fd=40, mask=1, ins=0xe884210, config=0xe868490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
ret = 0
#18 input_thread (data=0x7f044001f360) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
__flb_event_priority_live_foreach_iter = 0
__flb_event_priority_live_foreach_n_events = 1
ret = 0
thread_id = 0
tmp = "flb-in-systemd.0-w0\000\000\000\000\000c\317\023\000\000\000\000\000\300\346\177?\004\177\000\000GɡD\004\177", '\000' <repeats 17 times>
instance_exit = 0
event = 0x7f0438007bd0
ins = 0xe884210
evl_bktq = 0x7f0438007ba0
thi = 0x7f044001f360
p = 0xe875560
sched = 0x7f0438000b70
dns_ctx = {lookups = {prev = 0x7f043f7f4d30, next = 0x7f043f7f4d30}, lookups_drop = {prev = 0x7f043f7f4d40, next = 0x7f043f7f4d40}}
notification = 0x7f0444a87e5e <__GI___snprintf+158>
#19 0x0000000000577a77 in step_callback (data=0x7f04400249f0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7f04400249f0
#20 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 2, 139656308049984, 139656308050247, -8085596272872313471, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#21 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f0444a266c0 (LWP 1298273))]
#0 0x00007f0444b2caf6 in epoll_wait (epfd=8, events=0x7f0440000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007f0444b2caf6 in epoll_wait (epfd=8, events=0x7f0440000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018707aa in _mk_event_wait_2 (loop=0x7f04400017a0, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7f0440000b70
ret = 0
#2 0x0000000001870bd4 in mk_event_wait (loop=0x7f04400017a0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x0000000000544d30 in flb_engine_start (config=0xe868490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_engine.c:999
__flb_event_priority_live_foreach_iter = 1
__flb_event_priority_live_foreach_n_events = 0
ret = 0
tasks = 0
fs_chunks = 0
mem_chunks = 0
ts = 0
tmp = "24.0K\000\000\000\000\000\000\000\000\000\000"
rb_flush_flag = 0
t_flush = {tm = {tv_sec = 10, tv_nsec = 0}}
event = 0x0
evl = 0x7f04400017a0
evl_bktq = 0x7f0440005fd0
sched = 0x7f044001b610
dns_ctx = {lookups = {prev = 0x7f0444a1cd30, next = 0x7f0444a1cd30}, lookups_drop = {prev = 0x7f0444a1cd40, next = 0x7f0444a1cd40}}
notification = 0x0
rb_ms = 250
rb_env = 0x0
#4 0x00000000004c68da in flb_lib_worker (data=0xe868460) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_lib.c:835
ret = -1975923953
ctx = 0xe868460
config = 0xe868490
#5 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 22, 140737304904432, 140737304904695, -8085431676840628863, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f043ffff6c0 (LWP 1298274))]
#0 0x00007f0444b2caf6 in epoll_wait (epfd=11, events=0x7f04400061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007f0444b2caf6 in epoll_wait (epfd=11, events=0x7f04400061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0440006330, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7f0440006180
ret = 0
#2 0x0000000001870bd4 in mk_event_wait (loop=0x7f0440006330) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x00000000004cb1a4 in log_worker_collector (data=0x7f0440006090) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_log.c:166
__i = 1
__ctx = 0x7f0440006180
run = 1
event = 0x0
log = 0x7f0440006090
signal_value = 2
#4 0x0000000000577a77 in step_callback (data=0x7f0440009fa0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7f0440009fa0
#5 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 0, 139656308050416, 139656308050679, -8085597372920812159, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f0415c1d6c0 (LWP 1298279))]
#0 0x00007f0444b2caf6 in epoll_wait (epfd=97, events=0x7f0404000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007f0444b2caf6 in epoll_wait (epfd=97, events=0x7f0404000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0404001b10, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7f0404000ee0
ret = 0
#2 0x0000000001870bd4 in mk_event_wait (loop=0x7f0404001b10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x00000000018693c8 in mk_server_worker_loop (server=0x7f044017cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_server.c:506
__i = 1
__ctx = 0x7f0404000ee0
ret = 0
timeout_fd = 103
val = 1
event = 0x0
evl = 0x7f0404001b10
list = 0x7f04040063e0
head = 0x7f04040063e0
conn = 0x7f0415c13da0
sched = 0x7f0408000b90
listener = 0x7f0404006400
server_timeout = 0x7f040400e530
__i = 0
__ctx = 0x7f0404000ee0
#4 0x000000000185f471 in mk_sched_launch_worker_loop (data=0x7f0408001390) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_scheduler.c:417
ret = 0
wid = 0
len = 13
thread_name = 0x7f0404006390 "6\\A\364\003\177"
head = 0x7f044017cdf0
wcb = 0x7f04400ee940
sched = 0x7f0408000b90
notif = 0x7f0404006340
thinfo = 0x7f0408001390
ctx = 0x7f0408000b70
server = 0x7f044017cbb0
#5 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 2, 139655647550272, 139655647550535, -8085539699026219647, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f04455157c0 (LWP 1298272))]
#0 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
sc_cancel_oldtype = 0
sc_ret = <optimized out>
r = <optimized out>
#1 0x00007f0444b022e7 in __GI___nanosleep (req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
ret = <optimized out>
#2 0x00007f0444b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
save_errno = 22
max = 4294967295
ts = {tv_sec = 0, tv_nsec = 801045997}
#3 0x00000000004aab40 in flb_main_run (argc=3, argv=0x7ffff510c828) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1469
opt = -1
ret = 0
json = 0x21000 <error: Cannot access memory at address 0x21000>
last_plugin = -1
cfg_file = 0xe86dc60 <incomplete sequence \350>
cf = 0xe868df0
tmp = 0xe868df0
service = 0xe8683b0
s = 0x5b4a46 <flb_supervisor_requested+213>
section = 0xe8683b0
cf_opts = 0xe8682b0
group = 0x8a39cb0fdf26c000
supervisor_reload_notified = 0
trace_input = 0x0
trace_output = 0xe868440 "stdout"
trace_props = 0x0
long_opts = {{name = 0x18773b8 "storage_path", has_arg = 1, flag = 0x0, val = 98}, {name = 0x18773c5 "config", has_arg = 1, flag = 0x0, val = 99}, {name = 0x1877132 "daemon", has_arg = 0, flag = 0x0, val = 100}, {name = 0x18773cc "dry-run", has_arg = 0, flag = 0x0,
val = 68}, {name = 0x1877139 "flush", has_arg = 1, flag = 0x0, val = 102}, {name = 0x18773d4 "http", has_arg = 0, flag = 0x0, val = 72}, {name = 0x18773d9 "supervisor", has_arg = 0, flag = 0x0, val = 1029}, {name = 0x187716a "log_file", has_arg = 1, flag = 0x0,
val = 108}, {name = 0x18773e4 "port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x187713f "custom", has_arg = 1, flag = 0x0, val = 67}, {name = 0x18770fe "input", has_arg = 1, flag = 0x0, val = 105}, {name = 0x1877159 "processor", has_arg = 1, flag = 0x0,
val = 114}, {name = 0x1877163 "filter", has_arg = 1, flag = 0x0, val = 70}, {name = 0x1877104 "output", has_arg = 1, flag = 0x0, val = 111}, {name = 0x1877146 "match", has_arg = 1, flag = 0x0, val = 109}, {name = 0x18773e9 "parser", has_arg = 1, flag = 0x0, val = 82},
{name = 0x18773f0 "prop", has_arg = 1, flag = 0x0, val = 112}, {name = 0x18773f5 "plugin", has_arg = 1, flag = 0x0, val = 101}, {name = 0x1877173 "tag", has_arg = 1, flag = 0x0, val = 116}, {name = 0x18773fc "sp-task", has_arg = 1, flag = 0x0, val = 84}, {
name = 0x1877404 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0x187740c "verbose", has_arg = 0, flag = 0x0, val = 118}, {name = 0x1877414 "workdir", has_arg = 1, flag = 0x0, val = 119}, {name = 0x187741c "quiet", has_arg = 0, flag = 0x0, val = 113}, {
name = 0x1877422 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x1877427 "help-json", has_arg = 0, flag = 0x0, val = 74}, {name = 0x1877431 "coro_stack_size", has_arg = 1, flag = 0x0, val = 115}, {name = 0x1877441 "sosreport", has_arg = 0, flag = 0x0,
val = 83}, {name = 0x1877177 "http_server", has_arg = 0, flag = 0x0, val = 72}, {name = 0x187744b "http_listen", has_arg = 1, flag = 0x0, val = 76}, {name = 0x1877457 "http_port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x1877461 "enable-hot-reload", has_arg = 0,
flag = 0x0, val = 89}, {name = 0x1877473 "enable-chunk-trace", has_arg = 0, flag = 0x0, val = 90}, {name = 0x1877486 "trace", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x187748c "trace-input", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x1877498 "trace-output",
has_arg = 1, flag = 0x0, val = 1027}, {name = 0x18774a5 "trace-output-property", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x18774c0 "disable-thread-safety-on-hot-reload", has_arg = 0, flag = 0x0, val = 87}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
#4 0x00000000005b4acf in flb_supervisor_run (argc=3, argv=0x7ffff510c828, entry=0x4a9d5a <flb_main_run>) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_supervisor.c:626
clean_argv = 0x11ff
clean_argc = 32516
env_child = 0x6c6f6f705f68652e <error: Cannot access memory at address 0x6c6f6f705f68652e>
ret = -1
#5 0x00000000004aadcb in flb_main (argc=3, argv=0x7ffff510c828) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1564
No locals.
#6 0x00000000004aaded in main (argc=3, argv=0x7ffff510c828) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1572
No locals.
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f041dc406c0 (LWP 1298276))]
#0 0x00007f0444b2caf6 in epoll_wait (epfd=75, events=0x7f0440176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007f0444b2caf6 in epoll_wait (epfd=75, events=0x7f0440176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0440176640, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7f0440176310
ret = 0
#2 0x0000000001870bd4 in mk_event_wait (loop=0x7f0440176640) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x0000000000516d74 in output_thread (data=0x7f0440176050) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_output_thread.c:257
__flb_event_priority_live_foreach_iter = 1
__flb_event_priority_live_foreach_n_events = 0
n = 8
ret = 0
running = 1
stopping = 0
thread_id = 0
tmp = "flb-out-http.0-w0", '\000' <repeats 32 times>, "1298276\000\300&\337\017\3139\212"
event_local = {fd = 89, type = 65536, mask = 1, status = 2 '\002', data = 0x0, handler = 0x0, _head = {prev = 0x0, next = 0x0}, _priority_head = {prev = 0x0, next = 0x0}, priority = 6 '\006'}
event = 0x0
sched = 0x7f0410000e80
task = 0x7f044019c3c0
u_conn = 0x7f041002e030
ins = 0xe88a0f0
out_flush = 0x7f0410017ea0
th_ins = 0x7f0440176050
params = 0x0
sched_params = 0x0
dns_ctx = {lookups = {prev = 0x7f041dc36b70, next = 0x7f041dc36b70}, lookups_drop = {prev = 0x7f041dc36b80, next = 0x7f041dc36b80}}
notification = 0x0
#4 0x0000000000577a77 in step_callback (data=0x7f0440176660) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7f0440176660
#5 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 0, 139656308050032, 139656308050295, -8085522091270918783, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#6 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f041d43f6c0 (LWP 1298277))]
#0 0x00007f0444b2caf6 in epoll_wait (epfd=91, events=0x7f044017cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30 return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0 0x00007f0444b2caf6 in epoll_wait (epfd=91, events=0x7f044017cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
sc_ret = -4
sc_cancel_oldtype = 0
sc_ret = <optimized out>
#1 0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0440129310, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
ctx = 0x7f04400a6460
ret = 0
#2 0x0000000001870bd4 in mk_event_wait (loop=0x7f0440129310) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3 0x0000000001856e38 in mk_lib_worker (data=0x7f04400ada10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_lib.c:154
fd = 824195691
bytes = 1667331187
val = 2334097595223798896
server = 0x7f044017cbb0
event = 0x0
ctx = 0x7f04400ada10
__i = 32516
__ctx = 0x373732383932
#4 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 0, 139656308050464, 139656308050727, -8085520991222420095, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#5 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 8
[Switching to thread 8 (Thread 0x7f041641e6c0 (LWP 1298278))]
#0 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48 r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
sc_cancel_oldtype = 2
sc_ret = <optimized out>
r = <optimized out>
#1 0x00007f0444b022e7 in __GI___nanosleep (req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
ret = <optimized out>
#2 0x00007f0444b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
save_errno = 0
max = 4294967295
ts = {tv_sec = 0, tv_nsec = 804197386}
#3 0x000000000186cf57 in mk_clock_worker_init (data=0x7f044017cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_clock.c:124
cur_time = 1761733333
server = 0x7f044017cbb0
#4 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 2, 139655647550320, 139655647550583, -8085540799074718335, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#5 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
Hei @edsiper
I have run fluent-bit with valgrind and triggered a crash, I hope this new information will help debugging this issue. Good luck:
Here is the logfile generated by valgrind for this crash: valgrind.log
And here the backtrace from this crash:
- ZSTD_freeDDict is called with ddict=0x1, this doesn't look valid
- There is a different from the previous core dump (where customAlloc was 0x0 and customFree was 0x1). Now we have: customAlloc = 0x1102, customFree = 0x40000, opaque = 0x1102 (which also looks invalid?)
Given that the issue is recurring and the invalid pointers are different, it's likely that the decompression context (dctx) is being used after it has been freed (use-after-free) or the memory is being overwritten by something else?
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x70a36c0 (LWP 1373684) __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
2 Thread 0x2adc16c0 (LWP 1373689) 0x000000000538daf6 in epoll_wait (epfd=98, events=0xafaeab0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
3 Thread 0x2a5c06c0 (LWP 1373688) 0x0000000005357945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x2a5b6de0, rem=rem@entry=0x2a5b6de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
4 Thread 0x29dbf6c0 (LWP 1373687) 0x000000000538daf6 in epoll_wait (epfd=92, events=0xaf9a110, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
5 Thread 0x295be6c0 (LWP 1373686) 0x000000000538daf6 in epoll_wait (epfd=75, events=0xaf8c320, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
6 Thread 0x68a26c0 (LWP 1373683) 0x000000000538daf6 in epoll_wait (epfd=11, events=0x551bb20, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
7 Thread 0x60a16c0 (LWP 1373682) 0x000000000538daf6 in epoll_wait (epfd=8, events=0x55163b0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
8 Thread 0x549ce40 (LWP 1373681) 0x0000000005357945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x1fff000080, rem=rem@entry=0x1fff000080) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
(gdb) backtrace full
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
tid = <optimized out>
ret = 0
pd = <optimized out>
old_mask = {__val = {0}}
ret = <optimized out>
#1 0x000000000531ea43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.
#2 0x00000000052c8a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
ret = <optimized out>
#3 0x00000000052b08fa in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 0, 0, 720081008, 118061808, 118057200, 4912552, 0, 75857920, 4878989, 4878914, 32155344}}, sa_flags = 0, sa_restorer = 0x2aeb9070}
#4 0x00000000004a947d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
cf_opts = 0x0 #5 <signal handler called>
No locals.
#6 0x00000000011cbe84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
cMem = {customAlloc = 0x1102, customFree = 0x40000, opaque = 0x1102}
#7 0x000000000a50a609 in ZSTD_clearDict (dctx=0x2aeb9070) at .//decompress/zstd_decompress.c:315
No locals.
#8 ZSTD_freeDCtx (dctx=0x2aeb9070) at .//decompress/zstd_decompress.c:326
cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9 0x0000000004953b09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
__func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x169251d0, src_size=420, dst=<optimized out>, dst_size=0x7097978) at ../src/basic/compress.c:451
k = 0
size = 773
r = <optimized out>
dctx = 0x2aeb9070
input = {src = 0x169251d0, size = 420, pos = 420}
output = {dst = 0x2ae79030, size = 262144, pos = 773}
__func__ = <optimized out>
size = <optimized out>
r = <optimized out>
dctx = <optimized out>
input = <optimized out>
output = <optimized out>
k = <optimized out>
_found = <optimized out>
__assert_in_set = <optimized out>
__unique_prefix_A18 = <optimized out>
__unique_prefix_B19 = <optimized out>
_level = <optimized out>
_e = <optimized out>
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x556d210, payload=0x169251d0 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7097af0, ret_size=0x7097ae8)
at ../src/libsystemd/sd-journal/journal-file.c:1947
rsize = 773
r = <optimized out>
__func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x556d210, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7097af0, ret_size=0x7097ae8, data_threshold=<optimized out>) at ../src/libsystemd/sd-journal/journal-file.c:2009
size = 420
c = <optimized out>
r = <optimized out>
__func__ = <optimized out>
#14 0x00000000048b0d33 in sd_journal_enumerate_data (j=0x554bf60, data=0x7097b70, size=0x7099b98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
_e = <optimized out>
p = <optimized out>
d = 0x0
l = 0
_error = <optimized out>
_level = <optimized out>
n = <optimized out>
f = 0x556d210
o = 0x189dca50
r = <optimized out>
__func__ = "sd_journal_enumerate_data"
#15 0x000000000074170a in in_systemd_collect (ins=0x54ffe80, config=0x54b33b0, in_context=0x554bda0) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
--Type <RET> for more, q to quit, c to continue without paging--
ret = 0
ret_j = 1
entries = 16
skip_entries = 0
rows = 1
sec = 1761825735
nsec = 725945000
usec = 1761825735725945
length = 37
key = 0x16924fc8 "_SYSTEMD_UNIT=setroubleshootd.service"
cursor = 0x0
tag = 0x550e810 "uio_logs_5000_systemd"
new_tag = "\260\234\t\a\000\000\000\000\257\306L\000\000\000\000\000Ћ\t\a\000\000\000\000\210-\210\001\000\000\000\000\220*\210\001\000\000\000\000\227\003\000\000\005\000\000\000Ǔ\207\001\000\000\000\000(\000\000\0000\000\000\000\300\234\t\a\000\000\000\000\000\234\t\a\000
\000\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/30 13:02:15.656937075\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
tag_len = 21
last_tag_len = 21
data = 0x16924fc8
ctx = 0x554bda0
tm = {tm = {tv_sec = 1761825735, tv_nsec = 725945000}}
kvlist = 0x15201940
#16 0x00000000004fffcd in input_collector_fd (fd=40, ins=0x54ffe80) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
head = 0xb026ef8
collector = 0xb026e70
input_coro = 0x7099d00
config = 0x54b33b0
#17 0x0000000000500b0a in engine_handle_event (fd=40, mask=1, ins=0x54ffe80, config=0x54b33b0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
ret = 0
#18 input_thread (data=0x5542d20) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
__flb_event_priority_live_foreach_iter = 0
__flb_event_priority_live_foreach_n_events = 1
ret = 0
thread_id = 0
tmp = "flb-in-systemd.0-w0\000\000\000\000\000\364\365\024\000\000\000\000\000\3006\n\a\000\000\000\000\3006\n\a", '\000' <repeats 19 times>
instance_exit = 0
event = 0xb026e70
ins = 0x54ffe80
evl_bktq = 0xafd8c20
thi = 0x5542d20
p = 0x54d1020
sched = 0x554b790
dns_ctx = {lookups = {prev = 0x7099d30, next = 0x7099d30}, lookups_drop = {prev = 0x7099d40, next = 0x7099d40}}
notification = 0x52e8e5e <__GI___snprintf+158>
#19 0x0000000000577a77 in step_callback (data=0x5548520) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x5548520
#20 0x000000000531cb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {118109888, 5951356893848596964, -37112, 2, 101283904, 101284167, 5951359036303274468, 5951363918013067748}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#21 0x000000000538d4e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
No locals.
Hi, I tried to use system's libzstd and linking it for unifying the using libzstd between in_systemd plugin and the system provided libsystemd. This could unify the symbols for using systemd related code. https://github.com/fluent/fluent-bit/pull/11088 This could eliminate your issue?
Hi, I tried to use system's libzstd and linking it for unifying the using libzstd between in_systemd plugin and the system provided libsystemd. This could unify the symbols for using systemd related code. #11088 This could eliminate your issue?
@cosmo0920 Should this patch be used in addition to the ones sent by @edsiper, or instead of them?
I have applied your patch @cosmo0920 in addition to the patches from @edsiper and recompiled with the same result:
Here is the backtrace for this crash:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
tid = <optimized out>
ret = 0
pd = <optimized out>
old_mask = {__val = {0}}
ret = <optimized out>
#1 0x00007fe781ebda43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.
#2 0x00007fe781e67a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
ret = <optimized out>
#3 0x00007fe781e4f8fa in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 773, 0, 140632060825456, 140632276437744, 140632276433728, 4912552, 0, 140632306647040, 4878989, 4878914, 32155344}}, sa_flags = 0, sa_restorer = 0x7fe774078f70}
#4 0x00000000004a947d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
cf_opts = 0x0
#5 <signal handler called>
No locals.
#6 0x00000000011cbe84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
cMem = {customAlloc = 0x0, customFree = 0x1, opaque = 0x7fe780e18af0}
#7 0x00007fe779fad609 in ZSTD_clearDict (dctx=0x7fe774078f70) at .//decompress/zstd_decompress.c:315
No locals.
#8 ZSTD_freeDCtx (dctx=0x7fe774078f70) at .//decompress/zstd_decompress.c:326
cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9 0x00007fe782a6ab09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
__func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x7fe7679671d0, src_size=420, dst=<optimized out>, dst_size=0x7fe780e18978) at ../src/basic/compress.c:451
k = 0
size = 773
r = <optimized out>
dctx = 0x7fe774078f70
input = {src = 0x7fe7679671d0, size = 420, pos = 420}
output = {dst = 0x7fe774038f60, size = 262152, pos = 773}
__func__ = <optimized out>
size = <optimized out>
r = <optimized out>
dctx = <optimized out>
input = <optimized out>
output = <optimized out>
k = <optimized out>
_found = <optimized out>
__assert_in_set = <optimized out>
__unique_prefix_A18 = <optimized out>
__unique_prefix_B19 = <optimized out>
_level = <optimized out>
_e = <optimized out>
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x7fe774016d90, payload=0x7fe7679671d0 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fe780e18af0, ret_size=0x7fe780e18ae8)
at ../src/libsystemd/sd-journal/journal-file.c:1947
rsize = 773
r = <optimized out>
__func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x7fe774016d90, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fe780e18af0, ret_size=0x7fe780e18ae8, data_threshold=<optimized out>)
at ../src/libsystemd/sd-journal/journal-file.c:2009
size = 420
c = <optimized out>
r = <optimized out>
__func__ = <optimized out>
#14 0x00007fe7829c7d33 in sd_journal_enumerate_data (j=0x7fe774001140, data=0x7fe780e18b70, size=0x7fe780e1ab98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
_e = <optimized out>
p = <optimized out>
d = 0x0
l = 0
_error = <optimized out>
_level = <optimized out>
n = <optimized out>
f = 0x7fe774016d90
o = 0x7fe76339da08
r = <optimized out>
__func__ = "sd_journal_enumerate_data"
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x000000000074170a in in_systemd_collect (ins=0x7945210, config=0x7929490, in_context=0x7fe774001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
ret = 0
ret_j = 1
entries = 16
skip_entries = 0
rows = 1
sec = 1761911728
nsec = 750378000
usec = 1761911728750378
length = 37
key = 0x7fe767966fc8 "_SYSTEMD_UNIT=setroubleshootd.service"
cursor = 0x0
tag = 0x7944ff0 "uio_logs_5000_systemd"
new_tag = "\260\254\341\200\347\177\000\000\257\306L\000\000\000\000\000Л\341\200\347\177\000\000\210-\210\001\000\000\000\000\220*\210\001\000\000\000\000\227\003\000\000\005\000\000\000Ǔ\207\001\000\000\000\000(\000\000\0000\000\000\000\300\254\341\200\347\177\000\000\000\254\341\200\347\177\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/31 12:55:29.102444141\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
tag_len = 21
last_tag_len = 21
data = 0x7fe767966fc8
ctx = 0x7fe774001090
tm = {tm = {tv_sec = 1761911728, tv_nsec = 750378000}}
kvlist = 0x7fe7740113a0
#16 0x00000000004fffcd in input_collector_fd (fd=40, ins=0x7945210) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
head = 0x7fe774011378
collector = 0x7fe7740112f0
input_coro = 0x7fe780e1ad00
config = 0x7929490
#17 0x0000000000500b0a in engine_handle_event (fd=40, mask=1, ins=0x7945210, config=0x7929490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
ret = 0
#18 input_thread (data=0x7fe77c01f360) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
__flb_event_priority_live_foreach_iter = 0
__flb_event_priority_live_foreach_n_events = 1
ret = 0
thread_id = 0
tmp = "flb-in-systemd.0-w0\000\000\000\000\000M6\026\000\000\000\000\000\300F\342\200\347\177\000\000G\311\341\201\347\177", '\000' <repeats 17 times>
instance_exit = 0
event = 0x7fe7740112f0
ins = 0x7945210
evl_bktq = 0x7fe7740062c0
thi = 0x7fe77c01f360
p = 0x7936560
sched = 0x7fe774000b70
dns_ctx = {lookups = {prev = 0x7fe780e1ad30, next = 0x7fe780e1ad30}, lookups_drop = {prev = 0x7fe780e1ad40, next = 0x7fe780e1ad40}}
notification = 0x7fe781e87e5e <__GI___snprintf+158>
#19 0x0000000000577a77 in step_callback (data=0x7fe77c0249f0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
worker = 0x7fe77c0249f0
#20 0x00007fe781ebbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2218781407457520282, -37112, 0, 140632293230656, 140632293230919, 2232184335564330342, 2232182085447374182}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#21 0x00007fe781f2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
Hmmm, this could cased by collision of zstd symbols between system's one and bundled one. I tried to use another way to handle this: https://github.com/fluent/fluent-bit/pull/11111
Hello @edsiper #11073 doesn't fix the coredump described in this issue.
This bug is still active in fluent-bit 4.2.0 with the same behavior as described for version 4.1.1 in this issue. The crash still occurs when journald logs a multi-line log from selinux and fluent-bit reads it.
Hello @cosmo0920 I have tried https://github.com/fluent/fluent-bit/pull/11111 with fluent-bit 4.2.0 and I get the same behavior as described for version 4.1.1 in this issue. The crash still occurs when journald logs a multi-line log from selinux and fluent-bit reads it.
Using a UBI 10 image as well seems to trigger the sigsev pretty quickly even when built from source, but not distroless or UBI 9. Found this in some downstream testing after stepping up to UBI 10: https://github.com/FluentDo/agent/pull/115
hello @cosmo0920, any news on this issue? Do you need any more information/debugging from us?
regards
Hi, I tried to use system's zstd library to build fluent-bit packages for RHEL10 here: https://github.com/fluent/fluent-bit/pull/11111 Could you test it on RHEL 10? The built package is here: https://github.com/fluent/fluent-bit/actions/runs/20092756932?pr=11111
hello @cosmo0920, It is a pleasure to report that the package:
packages-pr-11111-almalinux-10
https://github.com/fluent/fluent-bit/actions/runs/20092756932/artifacts/4821866793
works on RHEL10 without problems.
Fluent-bit is able to handle multiline logs from selinux without any problems and does not crash. Thanks a lot for the help, hope 4.2.1 will be released soon :)
@rafaelma any idea what's the minimum config to trigger the issue? I'm trying to add some downstream tests to ensure we don't hit a regression in future so will try with just the new library installed but it may need a plugin that actually triggers the ABI to fail.
@rafaelma any idea what's the minimum config to trigger the issue? I'm trying to add some downstream tests to ensure we don't hit a regression in future so will try with just the new library installed but it may need a plugin that actually triggers the ABI to fail.
In our case was a multiline log in journald from audit/selinux:
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
***** Plugin catchall_boolean (89.3 confidence) suggests ******************
If you want to allow nis to enabled
Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
Do
setsebool -P nis_enabled 1
***** Plugin catchall (11.6 confidence) suggests **************************
If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
# semodule -X 300 -i my-rhsmpackagepr.pp
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
***** Plugin catchall_boolean (89.3 confidence) suggests ******************
I suppose that any selinux violation that produce this type of log will produce a coredump.
If you want to allow nis to enabled
Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
Do
setsebool -P nis_enabled 1
***** Plugin catchall (11.6 confidence) suggests **************************
If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
# semodule -X 300 -i my-rhsmpackagepr.pp
I suppose that any selinux violation will be enough to produce a crash.
I'm seeing core dump with the systemd input enabled using this RPM:
$ rpm -qi fluent-bit
Name : fluent-bit
Version : 4.2.1
Release : 1
Architecture: x86_64
Install Date: Mon 15 Dec 2025 10:43:45 GMT
Group : System Environment/Daemons
Size : 27689798
License : Apache v2.0
Signature :
RSA/SHA512, Sat 13 Dec 2025 17:34:51 GMT, Key ID 9f9ddc083888c1cd
Source RPM : fluent-bit-4.2.1-1.src.rpm
Build Date : Fri 12 Dec 2025 23:34:34 GMT
Build Host : e90c446b9ce6
Relocations : /
Vendor : Chronosphere Inc.
Summary : Fast data collector for Linux
Description :
Fluent Bit is a high performance and multi platform Log Forwarder.
$
I haven't been able to work out if this RPM contains previously discussed fix or not. The rpm has no change log and there aren't any release notes at https://fluentbit.io/announcements/
The only multiline log messages I can see in journald output are from Fluent Bit core dumps.
I'm seeing core dump with the systemd input enabled using this RPM:
$ rpm -qi fluent-bit Name : fluent-bit Version : 4.2.1 Release : 1 Architecture: x86_64 Install Date: Mon 15 Dec 2025 10:43:45 GMT Group : System Environment/Daemons Size : 27689798 License : Apache v2.0 Signature : RSA/SHA512, Sat 13 Dec 2025 17:34:51 GMT, Key ID 9f9ddc083888c1cd Source RPM : fluent-bit-4.2.1-1.src.rpm Build Date : Fri 12 Dec 2025 23:34:34 GMT Build Host : e90c446b9ce6 Relocations : / Vendor : Chronosphere Inc. Summary : Fast data collector for Linux Description : Fluent Bit is a high performance and multi platform Log Forwarder. $I haven't been able to work out if this RPM contains previously discussed fix or not. The rpm has no change log and there aren't any release notes at https://fluentbit.io/announcements/
The only multiline log messages I can see in
journaldoutput are from Fluent Bit core dumps.
Unfortunately, the official Fluent Bit v4.2.1 does not include this fix.
Unfortunately, the official Fluent Bit v4.2.1 does not include this fix.
This was very unfortunate for us :( ...... We hope 4.2.2 will be released soon
@rafaelma any idea what's the minimum config to trigger the issue? I'm trying to add some downstream tests to ensure we don't hit a regression in future so will try with just the new library installed but it may need a plugin that actually triggers the ABI to fail.
In our case was a multiline log in journald from audit/selinux:
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. ***** Plugin catchall_boolean (89.3 confidence) suggests ****************** If you want to allow nis to enabled Then you must tell SELinux about this by enabling the 'nis_enabled' boolean. Do setsebool -P nis_enabled 1 ***** Plugin catchall (11.6 confidence) suggests ************************** If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr # semodule -X 300 -i my-rhsmpackagepr.pp Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736 Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. ***** Plugin catchall_boolean (89.3 confidence) suggests ****************** I suppose that any selinux violation that produce this type of log will produce a coredump. If you want to allow nis to enabled Then you must tell SELinux about this by enabling the 'nis_enabled' boolean. Do setsebool -P nis_enabled 1 ***** Plugin catchall (11.6 confidence) suggests ************************** If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr # semodule -X 300 -i my-rhsmpackagepr.ppI suppose that any selinux violation will be enough to produce a crash.
Thanks, I've not been able to trigger it with our Alma/Rocky Linux 10 packages but have done so with UBI 10 container so does seem to be something weird.