fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Fluent-bit crashes with a coredump when running on RHEL10

Open rafaelma opened this issue 2 months ago • 17 comments

Bug Report

Describe the bug

Fluent-bit 4.0.x and 4.1.x crashes with a coredump when running on RHEL10.

The bug seems to be related to the systemd input plugin. When started, the agent works fine for a while before crashing with a coredump. When this happen, any subsequent attempts to start the agent will result in an immediate crash with another coredump.

If we delete all contents from storage.path (systemd.0/ and systemd.db), the agent will start without problems, and run for a while before crashing again with a coredump.

It seems to me that the systemd chunk file gets corrupted for some reason, and when this happens, the agent crashes.

This happens with packages (4.0.13, 4.1.0, 4.1.1) from the almalinux repo at packages.fluentbit.io on multiple servers. But I have compiled 4.1.0 and 4.1.1 from source to activate FLB_DEBUG and I get the same problem.

To Reproduce

  • Journald logs related to the crash:
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: [2025/10/24 15:11:22] [engine] caught signal (SIGSEGV)
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #0  0x7f2e9f071608      in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #1  0x7f2ea0057b08      in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #2  0x7f2e9ffb4d32      in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #3  0x5c899e            in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #4  0x55488e            in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #5  0x577e2b            in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #6  0x7f2e9f2bbb67      in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #7  0x7f2e9f32c6bb      in  ???() at ???:0
Oct 24 15:11:22 hostname.domain fluent-bit[939013]: #8  0xffffffffffffffff  in  ???() at ???:0
Oct 24 15:11:22 hostname.domain systemd-coredump[946015]: Process 939013 (fluent-bit) of user 0 terminated abnormally with signal 6/ABRT, processing...
Oct 24 15:11:22 hostname.domain systemd[1]: Started [email protected] - Process Core Dump (PID 946015/UID 0).
Oct 24 15:11:22 hostname.domain systemd-coredump[946016]: Removed old coredump core.fluent-bit.0.dfd7beb07d594c77bef0090bd555891f.834839.1761112301000000.zst.
Oct 24 15:11:22 hostname.domain systemd-coredump[946016]: [🡕] Process 939013 (fluent-bit) of user 0 dumped core.
                                                        
                                                        Module libzstd.so.1 from rpm zstd-1.5.5-9.el10.x86_64
                                                        Module libpcre2-8.so.0 from rpm pcre2-10.44-1.el10.3.x86_64
                                                        Module libcrypt.so.2 from rpm libxcrypt-4.4.36-10.el10.x86_64
                                                        Module libselinux.so.1 from rpm libselinux-3.8-2.el10_0.x86_64
                                                        Module libsasl2.so.3 from rpm cyrus-sasl-2.1.28-27.el10.x86_64
                                                        Module libevent-2.1.so.7 from rpm libevent-2.1.12-16.el10.x86_64
                                                        Module libkeyutils.so.1 from rpm keyutils-1.6.3-5.el10.x86_64
                                                        Module libkrb5support.so.0 from rpm krb5-1.21.3-8.el10_0.x86_64
                                                        Module libcom_err.so.2 from rpm e2fsprogs-1.47.1-3.el10.x86_64
                                                        Module libk5crypto.so.3 from rpm krb5-1.21.3-8.el10_0.x86_64
                                                        Module libkrb5.so.3 from rpm krb5-1.21.3-8.el10_0.x86_64
                                                        Module libgssapi_krb5.so.2 from rpm krb5-1.21.3-8.el10_0.x86_64
                                                        Module libz.so.1 from rpm zlib-ng-2.2.3-1.el10.x86_64
                                                        Module libcap.so.2 from rpm libcap-2.69-7.el10.x86_64
                                                        Module libcrypto.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
                                                        Module libssl.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
                                                        Module libsystemd.so.0 from rpm systemd-257-9.el10_0.1.x86_64
                                                        Module libyaml-0.so.2 from rpm libyaml-0.2.5-16.el10.x86_64
                                                        Stack trace of thread 939016:
                                                        #0  0x00007f2e9f2bd9dc __pthread_kill_implementation (libc.so.6 + 0x969dc)
                                                        #1  0x00007f2e9f267a96 raise (libc.so.6 + 0x40a96)
                                                        #2  0x00007f2e9f24f8fa abort (libc.so.6 + 0x288fa)
                                                        #3  0x00000000004b1498 n/a (n/a + 0x0)
                                                        #4  0x313a35312034322f n/a (n/a + 0x0)
                                                        ELF object binary architecture: AMD x86-64
Oct 24 15:11:22 hostname.domain systemd[1]: [email protected]: Deactivated successfully.
Oct 24 15:11:22 hostname.domain systemd[1]: [email protected]: Consumed 268ms CPU time, 106.8M memory peak.
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Main process exited, code=dumped, status=6/ABRT
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Failed with result 'core-dump'.
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Consumed 22.833s CPU time, 48.4M memory peak.
Oct 24 15:11:22 hostname.domain systemd[1]: fluent-bit.service: Scheduled restart job, restart counter is at 1.
  • Trace logs from fluent-bit, before, during, after the crash: fluent-bit.log.txt

  • I have been able to generate this backtrace from the coredump generated by the subsequent attempts to start the agent :


(gdb) backtrace 
#0  0x00007fd5df4bd9dc in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fd5df467a96 in raise () from /lib64/libc.so.6
#2  0x00007fd5df44f8fa in abort () from /lib64/libc.so.6
#3  0x00000000004ae47d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
#4  <signal handler called>
#5  0x00000000011d0e84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
#6  0x00007fd5df9ad609 in ZSTD_freeDCtx () from /lib64/libzstd.so.1
#7  0x00007fd5e01e5b09 in journal_file_data_payload.isra () from /lib64/libsystemd.so.0
#8  0x00007fd5e0142d33 in sd_journal_enumerate_data () from /lib64/libsystemd.so.0
#9  0x00000000007466ea in in_systemd_collect (ins=0x390d8d60, config=0x390a7490, in_context=0x7fd5d0001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:387
#10 0x0000000000746b07 in in_systemd_collect_archive (ins=0x390d8d60, config=0x390a7490, in_context=0x7fd5d0001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:512
#11 0x0000000000504fcd in input_collector_fd (fd=39, ins=0x390d8d60) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
#12 0x0000000000505b0a in engine_handle_event (fd=39, mask=1, ins=0x390d8d60, config=0x390a7490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
#13 input_thread (data=0x7fd5d801f4e0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
#14 0x000000000057ca77 in step_callback (data=0x7fd5d8024b70) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
#15 0x00007fd5df4bbb68 in start_thread () from /lib64/libc.so.6
#16 0x00007fd5df52c6bc in clone3 () from /lib64/libc.so.6

  • Steps to reproduce the problem:

The first crash after the agent has been working without problems for a while is random, I have not been able to identify the reason. After the first crash (and when the chunk file probably gets corrupted) the crash will be reproducible if you used the attached chunk/db files under storage.path/systemd.0/ and storage.path/systemd.db

Expected behavior

The agent should not crash with a coredump.

Screenshots

Your Environment

  • Version used: fluent-bit 4.0.13, 4.1.0 and 4.1.1
  • Configuration:
[SERVICE]
    # Flush
    # =====
    # set an interval of seconds before to flush records to a destination
    flush        10

    # Daemon
    # ======
    # instruct Fluent Bit to run in foreground or background mode.
    daemon       Off

    # Log_file
    # ========
    # Absolute path for an optional log file. By default all logs are
    # redirected to the standard error interface (stderr).
    log_file  /var/log/fluent-bit/fluent-bit.log

    # Log_Level
    # =========
    # Set the verbosity level of the service, values can be:
    #
    # - error
    # - warning
    # - info
    # - debug
    # - trace
    #
    # by default 'info' is set, that means it includes 'error' and 'warning'.
    log_level    trace

    # Parsers File
    # ============
    # specify an optional 'Parsers' configuration file
    parsers_file parsers.conf

    # Plugins File
    # ============
    # specify an optional 'Plugins' configuration file to load external plugins.
    plugins_file plugins.conf

    # HTTP Server
    # ===========
    # Enable/Disable the built-in HTTP Server for metrics
    http_server  On
    http_listen  127.0.0.1
    http_port    2020

    # Storage
    # =======
    # Fluent Bit can use memory and filesystem buffering based mechanisms
    #
    # - https://docs.fluentbit.io/manual/administration/buffering-and-storage
    #
    # storage metrics
    # ---------------
    # publish storage pipeline metrics in '/api/v1/storage'. The metrics are
    # exported only if the 'http_server' option is enabled.
    #
    storage.metrics on

    # storage.path
    # ------------
    # absolute file system path to store filesystem data buffers (chunks).
    #
    storage.path /var/lib/fluent-bit/storage


    # storage.sync
    # ------------
    # configure the synchronization mode used to store the data into the
    # filesystem. It can take the values normal or full.
    #
    storage.sync normal

    # storage.checksum
    # ----------------
    # enable the data integrity check when writing and reading data from the
    # filesystem. The storage layer uses the CRC32 algorithm.
    #
    storage.checksum off

    # storage.backlog.mem_limit
    # -------------------------
    # if storage.path is set, Fluent Bit will look for data chunks that were
    # not delivered and are still in the storage layer, these are called
    # backlog data. This option configure a hint of maximum value of memory
    # to use when processing these records.
    #
    storage.backlog.mem_limit 100M

    # storage.max_chunks_up
    # ---------------------
    # If the input plugin has enabled filesystem storage type, this
    # property sets the maximum number of chunks that can be up in
    # memory. Use this setting to control memory usage when you enable
    # storage.type filesystem.
    #
    storage.max_chunks_up 128

    # storage.delete_irrecoverable_chunks
    # -----------------------------------
    # When enabled, irrecoverable chunks will be deleted during
    # runtime, and any other irrecoverable chunk located in the
    # configured storage path directory will be deleted when
    # Fluent-Bit starts. Accepted values: 'Off, 'On.
    #
    storage.delete_irrecoverable_chunks on

    # scheduler.base
    # ---------------
    # Set a base of exponential backoff in seconds. 
    scheduler.base 5

    # scheduler.cap
    # -------------
    # Set a maximum retry time in seconds.
    scheduler.cap 900
    
[INPUT]
    Name    systemd
    Tag     logs_5000_systemd
    db      /var/lib/fluent-bit/storage/systemd.db
    db.Sync   Normal

    Mem_Buf_Limit 100MB
    storage.type filesystem
    storage.pause_on_chunks_overlimit on

    Read_From_Tail On
    Lowercase On
    Threaded true

[FILTER]
    Name modify
    Match logs_5000_systemd

    Add dataops.data_processor dataops-logs-systemd

    Add event.module systemd
    Add event.provider systemd
    Add event.dataset systemd.journald

    Add data_stream.namespace prod
    Add data_stream.dataset systemd.journald

    Add service.name linux-systemd

[FILTER]
    Name nest
    Match *

    Operation nest
    Wildcard dataops.*
    Nest_under dataops
    Remove_prefix dataops.

[FILTER]
    Name nest
    Match *

    Operation nest
    Wildcard event.*
    Nest_under event
    Remove_prefix event.

[FILTER]
    Name nest
    Match *

    Operation nest
    Wildcard data_stream.*
    Nest_under data_stream
    Remove_prefix data_stream.

[FILTER]
    Name nest
    Match *

    Operation nest
    Wildcard service.*
    Nest_under service
    Remove_prefix service.
    
[FILTER]
    Name modify
    Match *
    Add agent.type fluent-bit

[FILTER]
    Name sysinfo
    Match *
    Fluentbit_version_key agent.version
    Os_name_key os.name
    Os_version_key os.version
    Kernel_version_key os.kernel
    Hostname_key host.name

[FILTER]
    Name nest
    Match *
    
    Operation nest
    Wildcard agent.*
    Nest_under agent
    Remove_prefix agent.

[FILTER]
    Name nest
    Match *
    
    Operation nest
    Wildcard os.*
    Nest_under os
    Remove_prefix os.

[FILTER]
    Name nest
    Match *

    Operation nest
    Wildcard host.*
    Wildcard os*
    Nest_under host
    Remove_prefix host.

[OUTPUT]
    Name   http
    Match  logs_5000_*
    
    Host   server-receiver.example.org
    Port   5000

    Format json
    Workers 1
    storage.total_limit_size  100M
    Retry_Limit no_limits
    
    tls On
    tls.verify On
    tls.ca_file /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
    tls.crt_file /path/my.crt
    tls.key_file /path/my.key

fluent-bit.conf.txt / fluent-bit-systemd.conf.txt

  • Server type and version: Linux 6.12.0-55.38.1.el10_0.x86_64 x86_64 GNU/Linux
  • Operating System and version: Red Hat Enterprise Linux release 10.0 (Coughlan)
  • Filters and plugins: systemd(input), modify, nest, sysinfo, http(output)

Additional context

rafaelma avatar Oct 24 '25 11:10 rafaelma

@rafaelma thanks for providing the bug with useful info.

I have pushed a "potential fix" for this issue here: https://github.com/fluent/fluent-bit/pull/11073, would you please give it a try ?

edsiper avatar Oct 27 '25 04:10 edsiper

Hello, thank you very much for your replay.

I have patched the 4.1.1 source code with your commit https://github.com/fluent/fluent-bit/pull/11073/commits/0bd94ddbbc01c88c8752bc1dc135517e2e1bed5a , compiled the source and started the agent.

The agent worked without problems for 30min and crashed again.

We wonder if maybe the systemd input plugin is having problems parsing multiline logs in RHEL10? These multiline logs are processed without problems in rhel7,8,9.

[2025/10/27 10:11:23] [engine] caught signal (SIGSEGV)
#0  0x11d0e84           in  ZSTD_freeDDict() at lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
#1  0x7fb8582f3608      in  ???() at ???:0
#2  0x7fb858acbb08      in  ???() at ???:0
#3  0x7fb858a28d32      in  ???() at ???:0
#4  0x7466f9            in  in_systemd_collect() at plugins/in_systemd/systemd.c:397
#5  0x504fcc            in  input_collector_fd() at src/flb_input_thread.c:166
#6  0x505b09            in  engine_handle_event() at src/flb_input_thread.c:181
#7  0x505b09            in  input_thread() at src/flb_input_thread.c:420
#8  0x57ca76            in  step_callback() at src/flb_worker.c:43
#9  0x7fb857ebbb67      in  ???() at ???:0
#10 0x7fb857f2c6bb      in  ???() at ???:0
#11 0xffffffffffffffff  in  ???() at ???:0
Aborted (core dumped)

The last logs in journald before the crash are:

Oct 27 10:11:10 hostname.domain sshd[1136]: srclimit_penalise: ipv4: new 209.38.98.72/32 deferred penalty of 1 seconds for penalty: connections without attempting authentication

Oct 27 10:11:21 hostname.domain systemd[1]: Starting setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs...

Oct 27 10:11:21 hostname.domain systemd[1]: Started setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs.

Oct 27 10:11:21 hostname.domain systemd[1]: Started dbus-:[email protected].

Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736

Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp
                                                       
Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736

Oct 27 10:11:22 hostname.domain setroubleshoot[1147531]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp

And the last log sent via the output plugin is:

Oct 27 10:11:10 hostname.domain sshd[1136]: srclimit_penalise: ipv4: new 209.38.98.72/32 deferred penalty of 1 seconds for penalty: connections without attempting authentication

I have checked the other crashes and many of them have these multiline logs from selinux right before the crash

And these are the trace logs from fluent-bit right before the crash:

[2025/10/27 10:11:15.620741844] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 is connected
[2025/10/27 10:11:15.620753465] [debug] [http_client] not using http_proxy for header
[2025/10/27 10:11:15.620772172] [trace] [io coro=0x7fb8240303c0] [net_write] trying 161 bytes
[2025/10/27 10:11:15.620816831] [trace] [io coro=0x7fb8240303c0] [net_write] ret=161 total=161/161
[2025/10/27 10:11:15.620835143] [trace] [io coro=0x7fb8240303c0] [net_write] trying 5252 bytes
[2025/10/27 10:11:15.620869714] [trace] [io coro=0x7fb8240303c0] [net_write] ret=5252 total=5252/5252
[2025/10/27 10:11:15.620877740] [trace] [io coro=0x7fb8240303c0] [net_read] try up to 4095 bytes
[2025/10/27 10:11:15.621853366] [trace] [engine] resuming coroutine=0x7fb8240303c0
[2025/10/27 10:11:15.622054062] [trace] [engine] resuming coroutine=0x7fb8240303c0
[2025/10/27 10:11:15.624272333] [trace] [engine] resuming coroutine=0x7fb8240303c0
[2025/10/27 10:11:15.624377117] [trace] [io coro=0x7fb8240303c0] [net_read] ret=66
[2025/10/27 10:11:15.624393096] [ info] [output:http:http.0] receiver-server.example.org:5000, HTTP status=200
ok
[2025/10/27 10:11:15.624414336] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 is now available
[2025/10/27 10:11:15.624432065] [debug] [out flush] cb_destroy coro_id=60
[2025/10/27 10:11:15.624438564] [trace] [coro] destroy coroutine=0x7fb8240303c0 data=0x7fb8240303e0
[2025/10/27 10:11:15.624496917] [trace] [engine] [task event] task_id=0 out_id=0 return=OK
[2025/10/27 10:11:15.624519077] [debug] [task] destroy task=0x7fb850199c90 (task_id=0)
[2025/10/27 10:11:15.624528524] [trace] [1097] http.0 -> fs_chunks_size = 36864 mod=-36864 chunk=1146617-1761556270.352518021.flb
[2025/10/27 10:11:15.624532962] [debug] [input chunk] remove chunk 1146617-1761556270.352518021.flb with 36864 bytes from plugin http.0, the updated fs_chunks_size is 0 bytes
[2025/10/27 10:11:21.852650890] [trace] [565] http.0 -> fs_chunks_size = 0
[2025/10/27 10:11:21.852677361] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb required 916 bytes and 100000000 bytes left in plugin http.0
[2025/10/27 10:11:21.852779117] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 30 elements, output map size 37 elements
[2025/10/27 10:11:21.852818662] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 37, will be 37, nested map size will be 1
[2025/10/27 10:11:21.852848705] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 37, will be 35, nested map size will be 3
[2025/10/27 10:11:21.852882663] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 34, nested map size will be 2
[2025/10/27 10:11:21.852916944] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 34, will be 34, nested map size will be 1
[2025/10/27 10:11:21.852956203] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 34 elements, output map size 35 elements
[2025/10/27 10:11:21.853015772] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 39, nested map size will be 2
[2025/10/27 10:11:21.853053461] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 39, will be 37, nested map size will be 3
[2025/10/27 10:11:21.853088929] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 37, will be 36, nested map size will be 2
[2025/10/27 10:11:21.853125410] [trace] [input chunk] update output instances with new chunk size diff=4096, records=1, input=systemd.0
[2025/10/27 10:11:21.853135909] [trace] [2226] http.0 -> fs_chunks_size = 0 mod=4096 chunk=1146617-1761556281.852406233.flb
[2025/10/27 10:11:21.853144527] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb update plugin http.0 fs_chunks_size by 4096 bytes, the current fs_chunks_size is 4096 bytes
[2025/10/27 10:11:21.853156141] [trace] [565] http.0 -> fs_chunks_size = 4096
[2025/10/27 10:11:21.853161330] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb required 928 bytes and 99995904 bytes left in plugin http.0
[2025/10/27 10:11:21.853215372] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/27 10:11:21.853687959] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/27 10:11:21.853741919] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/27 10:11:21.853804211] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/27 10:11:21.853858033] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/27 10:11:21.853918417] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/27 10:11:21.853990560] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/27 10:11:21.854037867] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/27 10:11:21.854124235] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
[2025/10/27 10:11:22.352419099] [trace] [565] http.0 -> fs_chunks_size = 4096
[2025/10/27 10:11:22.352443855] [trace] [input chunk] chunk 1146617-1761556281.852406233.flb required 946 bytes and 99995904 bytes left in plugin http.0
[2025/10/27 10:11:22.352514618] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/27 10:11:22.352543033] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/27 10:11:22.352570398] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/27 10:11:22.352594301] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/27 10:11:22.352617713] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/27 10:11:22.352646867] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/27 10:11:22.352688181] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/27 10:11:22.352711794] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/27 10:11:22.352738135] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2

rafaelma avatar Oct 27 '25 13:10 rafaelma

@rafaelma I did another check and found that I needed another restart of the cursor in the collector function, I have updated the branch replacing the old fix with a new one, would you please retry it ?, thanks

edsiper avatar Oct 28 '25 03:10 edsiper

Hello Eduardo, thanks for the patch.

Same procedure as last time, and another crash after 1 hour running without problems, just after the same type of logs:

Oct 28 10:11:22 hostname.domain systemd[1]: Starting setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs...
Oct 28 10:11:22 hostname.domain systemd[1]: Started setroubleshootd.service - SETroubleshoot daemon for processing new SELinux denial logs.
Oct 28 10:11:22 hostname.domain systemd[1]: Started dbus-:[email protected].
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp
                                                       
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp
                                                       
Oct 28 10:11:23 hostname.domain systemd-coredump[1221235]: [🡕] Process 1218463 (fluent-bit) of user 0 dumped core.
                                                         
                                                         Module libzstd.so.1 from rpm zstd-1.5.5-9.el10.x86_64
                                                         Module libz.so.1 from rpm zlib-ng-2.2.3-1.el10.x86_64
                                                         Module libcap.so.2 from rpm libcap-2.69-7.el10.x86_64
                                                         Module libcrypto.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
                                                         Module libssl.so.3 from rpm openssl-3.2.2-16.el10_0.4.x86_64
                                                         Module libsystemd.so.0 from rpm systemd-257-9.el10_0.1.x86_64
                                                         Module libyaml-0.so.2 from rpm libyaml-0.2.5-16.el10.x86_64
                                                         Stack trace of thread 1218466:
                                                         #0  0x00007fd7c0abd9dc __pthread_kill_implementation (libc.so.6 + 0x969dc)
                                                         #1  0x00007fd7c0a67a96 raise (libc.so.6 + 0x40a96)
                                                         #2  0x00007fd7c0a4f8fa abort (libc.so.6 + 0x288fa)
                                                         #3  0x00000000004ae47d n/a (n/a + 0x0)
                                                         #4  0x00007fd7c0a67b40 __restore_rt (libc.so.6 + 0x40b40)
                                                         #5  0x00000000011d0ec4 n/a (n/a + 0x0)
                                                         #6  0x00007fd7c0fad609 ZSTD_freeDCtx (libzstd.so.1 + 0x5e609)
                                                         #7  0x00007fd7c1789b09 journal_file_data_payload.isra.0 (libsystemd.so.0 + 0xd1b09)
                                                         #8  0x00007fd7c16e6d33 sd_journal_enumerate_data (libsystemd.so.0 + 0x2ed33)
                                                         #9  0x000000000074670a n/a (n/a + 0x0)
                                                         #10 0x0000000000504fcd n/a (n/a + 0x0)
                                                         #11 0x0000000000505b0a n/a (n/a + 0x0)
                                                         #12 0x000000000057ca77 n/a (n/a + 0x0)
                                                         #13 0x00007fd7c0abbb68 start_thread (libc.so.6 + 0x94b68)
                                                         #14 0x00007fd7c0b2c6bc __clone3 (libc.so.6 + 0x1056bc)
                                                         
                                                         Stack trace of thread 1218465:
                                                         #0  0x00007fd7c0b2caf6 epoll_wait (libc.so.6 + 0x105af6)
                                                         #1  0x00000000018757ea n/a (n/a + 0x0)
                                                         #2  0x0000000001875c14 n/a (n/a + 0x0)
                                                         #3  0x00000000004d01a4 n/a (n/a + 0x0)
                                                         #4  0x000000000057ca77 n/a (n/a + 0x0)
                                                         #5  0x00007fd7c0abbb68 start_thread (libc.so.6 + 0x94b68)
                                                         #6  0x00007fd7c0b2c6bc __clone3 (libc.so.6 + 0x1056bc)
                                                         
                                                         Stack trace of thread 1218463:
                                                         #0  0x00007fd7c0af6945 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0xcf945)
                                                         #1  0x00007fd7c0b022e7 __nanosleep (libc.so.6 + 0xdb2e7)
                                                         #2  0x00007fd7c0b1451c sleep (libc.so.6 + 0xed51c)
                                                         #3  0x00000000004afb40 n/a (n/a + 0x0)
                                                         #4  0x00000000005b9acf n/a (n/a + 0x0)
                                                         #5  0x00000000004afdcb n/a (n/a + 0x0)
                                                         #6  0x00000000004afded n/a (n/a + 0x0)
                                                         #7  0x00007fd7c0a5130e __libc_start_call_main (libc.so.6 + 0x2a30e)
                                                         #8  0x00007fd7c0a513c9 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a3c9)
                                                         #9  0x00000000004a85f5 n/a (n/a + 0x0)
                                                         ELF object binary architecture: AMD x86-64

I have installed some debug packages and I think the backtrace output from the coredump is more complete now. I hope you will get more out of it.

Maybe is not important but at thread 3 there are 2 errors of this type: error: Cannot access memory at address

(gdb) info threads
  Id   Target Id                           Frame 
* 1    Thread 0x7fd7bb7fe6c0 (LWP 1218466) __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  2    Thread 0x7fd7bbfff6c0 (LWP 1218465) 0x00007fd7c0b2caf6 in epoll_wait (epfd=11, events=0x7fd7bc0061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7fd7c14cc7c0 (LWP 1218463) 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  4    Thread 0x7fd79dbc36c0 (LWP 1218467) 0x00007fd7c0b2caf6 in epoll_wait (epfd=75, events=0x7fd7bc176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  5    Thread 0x7fd79d3c26c0 (LWP 1218468) 0x00007fd7c0b2caf6 in epoll_wait (epfd=92, events=0x7fd7bc17cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  6    Thread 0x7fd79323d6c0 (LWP 1218470) 0x00007fd7c0b2caf6 in epoll_wait (epfd=97, events=0x7fd780000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  7    Thread 0x7fd7c0a266c0 (LWP 1218464) 0x00007fd7c0b2caf6 in epoll_wait (epfd=8, events=0x7fd7bc000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  8    Thread 0x7fd793a3e6c0 (LWP 1218469) 0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48

(gdb) backtrace full
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44                                                                                                                                                                                                                                                       [0/1804]
        tid = <optimized out>                                   
        ret = 0
        pd = <optimized out>
        old_mask = {__val = {0}}
        ret = <optimized out>
#1  0x00007fd7c0abda43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.
#2  0x00007fd7c0a67a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>
#3  0x00007fd7c0a4f8fa in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 773, 0, 140564415133024, 140564540369648, 140564540365632, 4933032, 0, 140564641116160, 4899469, 4899394, 32175824}}, sa_flags = 0, sa_restorer = 0x7fd7b4083560}
#4  0x00000000004ae47d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
        cf_opts = 0x0
#5  <signal handler called>
No locals.
#6  0x00000000011d0ec4 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
        cMem = {customAlloc = 0x0, customFree = 0x1, opaque = 0x7fd7bb7f2af0}
#7  0x00007fd7c0fad609 in ZSTD_clearDict (dctx=0x7fd7b4083560) at .//decompress/zstd_decompress.c:315
No locals.
#8  ZSTD_freeDCtx (dctx=0x7fd7b4083560) at .//decompress/zstd_decompress.c:326
        cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9  0x00007fd7c1789b09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
        __func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x7fd7a5f06908, src_size=420, dst=<optimized out>, dst_size=0x7fd7bb7f2978) at ../src/basic/compress.c:451
        k = 0
        size = 773
        r = <optimized out>
        dctx = 0x7fd7b4083560
        input = {src = 0x7fd7a5f06908, size = 420, pos = 420}
        output = {dst = 0x7fd7b4043550, size = 262152, pos = 773}
        __func__ = <optimized out>
        size = <optimized out>
        r = <optimized out>
        dctx = <optimized out>
        input = <optimized out>
        output = <optimized out>
        k = <optimized out>
        _found = <optimized out>
        __assert_in_set = <optimized out>
        __unique_prefix_A18 = <optimized out>
        __unique_prefix_B19 = <optimized out>
        _level = <optimized out>
        _e = <optimized out>
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x7fd7b4016d90, payload=0x7fd7a5f06908 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fd7bb7f2af0, ret_size=0x7fd7bb7f2ae8) at ../src/libsystemd/sd-journal/journal-file.c:1947
        rsize = 773
        r = <optimized out>
        __func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x7fd7b4016d90, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fd7bb7f2af0, ret_size=0x7fd7bb7f2ae8, data_threshold=<optimized out>) at ../src/libsystemd/sd-journal/journal-file.c:2009
        size = 420
        c = <optimized out>
        r = <optimized out>
        __func__ = <optimized out>
#14 0x00007fd7c16e6d33 in sd_journal_enumerate_data (j=0x7fd7b4001140, data=0x7fd7bb7f2b70, size=0x7fd7bb7f4b98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
        _e = <optimized out>
        p = <optimized out>
        d = 0x0
        l = 0
        _error = <optimized out>
        _level = <optimized out>
        n = <optimized out>
        f = 0x7fd7b4016d90
        o = 0x7fd7a6160ec0
        r = <optimized out>
        __func__ = "sd_journal_enumerate_data"
#15 0x000000000074670a in in_systemd_collect (ins=0x25d30210, config=0x25d14490, in_context=0x7fd7b4001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
        ret = 0
        ret_j = 1
        entries = 16
        skip_entries = 0
        rows = 1
        sec = 1761642683
        nsec = 256042000
        usec = 1761642683256042
        length = 37
        key = 0x7fd7a5f06700 "_SYSTEMD_UNIT=setroubleshootd.service"
        cursor = 0x0
        tag = 0x25d2fff0 "uio_logs_5000_systemd"
        new_tag = "\260L\177\273\327\177\000\000\257\026M\000\000\000\000\000\320;\177\273\327\177\000\000\210}\210\001\000\000\000\000\220z\210\001\000\000\000\000\227\003\000\000\005\000\000\000\307\343\207\001\000\000\000\000(\000\000\0000\000\000\000\300L\177\273\327\177\000\000\000L\177\273\327\177\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/28 10:11:23.102410123\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
        last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
        tag_len = 21
        last_tag_len = 21
        data = 0x7fd7a5f06700
        ctx = 0x7fd7b4001090
        tm = {tm = {tv_sec = 1761642683, tv_nsec = 256042000}}
        kvlist = 0x7fd7b4004830
#16 0x0000000000504fcd in input_collector_fd (fd=91, ins=0x25d30210) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
        head = 0x7fd7b4007d48
        collector = 0x7fd7b4007cc0
        input_coro = 0x7fd7bb7f4d00
        config = 0x25d14490
#17 0x0000000000505b0a in engine_handle_event (fd=91, mask=1, ins=0x25d30210, config=0x25d14490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
        ret = 0
#18 input_thread (data=0x7fd7bc01f360) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
        __flb_event_priority_live_foreach_iter = 0
        __flb_event_priority_live_foreach_n_events = 1
        ret = 0
        thread_id = 0
        tmp = "flb-in-systemd.0-w0\000\000\000\000\000\242\227\022\000\000\000\000\000\300\346\177\273\327\177\000\000Gɡ\300\327\177", '\000' <repeats 17 times>
        instance_exit = 0
        event = 0x7fd7b4007cc0
        ins = 0x25d30210
        evl_bktq = 0x7fd7b4007bf0
        thi = 0x7fd7bc01f360
        p = 0x25d21560
        sched = 0x7fd7b4000b70
        dns_ctx = {lookups = {prev = 0x7fd7bb7f4d30, next = 0x7fd7bb7f4d30}, lookups_drop = {prev = 0x7fd7bb7f4d40, next = 0x7fd7bb7f4d40}}
        notification = 0x7fd7c0a87e5e <__GI___snprintf+158>
#19 0x000000000057ca77 in step_callback (data=0x7fd7bc0249f0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x7fd7bc0249f0
#20 0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 2, 140564626524224, 140564626524487, 2287494562463486489, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#21 0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fd7bbfff6c0 (LWP 1218465))]
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=11, events=0x7fd7bc0061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=11, events=0x7fd7bc0061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc006330, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7fd7bc006180
        ret = 0
#2  0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc006330) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x00000000004d01a4 in log_worker_collector (data=0x7fd7bc006090) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_log.c:166
        __i = 1
        __ctx = 0x7fd7bc006180
        run = 1
        event = 0x0
        log = 0x7fd7bc006090
        signal_value = 2
#4  0x000000000057ca77 in step_callback (data=0x7fd7bc009fa0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x7fd7bc009fa0
#5  0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 0, 140564626524656, 140564626524919, 2287495662511985177, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fd7c14cc7c0 (LWP 1218463))]
#0  0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48        r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0  0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007fd7c0b022e7 in __GI___nanosleep (req=req@entry=0x7ffeda8a9060, rem=rem@entry=0x7ffeda8a9060) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007fd7c0b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
        save_errno = 22
        max = 4294967295
        ts = {tv_sec = 0, tv_nsec = 102016488}
#3  0x00000000004afb40 in flb_main_run (argc=3, argv=0x7ffeda8a92e8) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1469
        opt = -1
        ret = 0
        json = 0x21000 <error: Cannot access memory at address 0x21000>
        last_plugin = -1
        cfg_file = 0x25d19c60 "\031]\002"
        cf = 0x25d14df0
        tmp = 0x25d14df0
        service = 0x25d143b0
        s = 0x5b9a46 <flb_supervisor_requested+213>
        section = 0x25d143b0
        cf_opts = 0x25d142b0
        group = 0x862da3ad06103d00
        supervisor_reload_notified = 0
        trace_input = 0x0
        trace_output = 0x25d14440 "stdout"
        trace_props = 0x0
        long_opts = {{name = 0x187c3b8 "storage_path", has_arg = 1, flag = 0x0, val = 98}, {name = 0x187c3c5 "config", has_arg = 1, flag = 0x0, val = 99}, {name = 0x187c132 "daemon", has_arg = 0, flag = 0x0, val = 100}, {name = 0x187c3cc "dry-run", has_arg = 0, flag = 0x0, val = 68}, {name = 0x187c139 "flush", has_arg = 1, flag = 0x0, val = 102}, {name = 0x187c3d4 "http", 
            has_arg = 0, flag = 0x0, val = 72}, {name = 0x187c3d9 "supervisor", has_arg = 0, flag = 0x0, val = 1029}, {name = 0x187c16a "log_file", has_arg = 1, flag = 0x0, val = 108}, {name = 0x187c3e4 "port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x187c13f "custom", has_arg = 1, flag = 0x0, val = 67}, {name = 0x187c0fe "input", has_arg = 1, flag = 0x0, val = 105}, {
            name = 0x187c159 "processor", has_arg = 1, flag = 0x0, val = 114}, {name = 0x187c163 "filter", has_arg = 1, flag = 0x0, val = 70}, {name = 0x187c104 "output", has_arg = 1, flag = 0x0, val = 111}, {name = 0x187c146 "match", has_arg = 1, flag = 0x0, val = 109}, {name = 0x187c3e9 "parser", has_arg = 1, flag = 0x0, val = 82}, {name = 0x187c3f0 "prop", has_arg = 1, 
            flag = 0x0, val = 112}, {name = 0x187c3f5 "plugin", has_arg = 1, flag = 0x0, val = 101}, {name = 0x187c173 "tag", has_arg = 1, flag = 0x0, val = 116}, {name = 0x187c3fc "sp-task", has_arg = 1, flag = 0x0, val = 84}, {name = 0x187c404 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0x187c40c "verbose", has_arg = 0, flag = 0x0, val = 118}, {
            name = 0x187c414 "workdir", has_arg = 1, flag = 0x0, val = 119}, {name = 0x187c41c "quiet", has_arg = 0, flag = 0x0, val = 113}, {name = 0x187c422 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x187c427 "help-json", has_arg = 0, flag = 0x0, val = 74}, {name = 0x187c431 "coro_stack_size", has_arg = 1, flag = 0x0, val = 115}, {name = 0x187c441 "sosreport", 
            has_arg = 0, flag = 0x0, val = 83}, {name = 0x187c177 "http_server", has_arg = 0, flag = 0x0, val = 72}, {name = 0x187c44b "http_listen", has_arg = 1, flag = 0x0, val = 76}, {name = 0x187c457 "http_port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x187c461 "enable-hot-reload", has_arg = 0, flag = 0x0, val = 89}, {name = 0x187c473 "enable-chunk-trace", 
            has_arg = 0, flag = 0x0, val = 90}, {name = 0x187c486 "trace", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x187c48c "trace-input", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x187c498 "trace-output", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x187c4a5 "trace-output-property", has_arg = 1, flag = 0x0, val = 1028}, {
            name = 0x187c4c0 "disable-thread-safety-on-hot-reload", has_arg = 0, flag = 0x0, val = 87}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
#4  0x00000000005b9acf in flb_supervisor_run (argc=3, argv=0x7ffeda8a92e8, entry=0x4aed5a <flb_main_run>) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_supervisor.c:626
        clean_argv = 0x11ff
        clean_argc = 32727
        env_child = 0x6c6f6f705f68652e <error: Cannot access memory at address 0x6c6f6f705f68652e>
        ret = -1
#5  0x00000000004afdcb in flb_main (argc=3, argv=0x7ffeda8a92e8) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1564
No locals.
#6  0x00000000004afded in main (argc=3, argv=0x7ffeda8a92e8) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1572
No locals.
gdb) thread 4
[Switching to thread 4 (Thread 0x7fd79dbc36c0 (LWP 1218467))]
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=75, events=0x7fd7bc176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=75, events=0x7fd7bc176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc176640, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7fd7bc176310
        ret = 0
#2  0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc176640) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x000000000051bd74 in output_thread (data=0x7fd7bc176050) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_output_thread.c:257
        __flb_event_priority_live_foreach_iter = 1
        __flb_event_priority_live_foreach_n_events = 0
        n = 8
        ret = 0
        running = 1
        stopping = 0
        thread_id = 0
        tmp = "flb-out-http.0-w0", '\000' <repeats 32 times>, "1218467\000=\020\006\255\243-\206"
        event_local = {fd = 89, type = 65536, mask = 1, status = 2 '\002', data = 0x0, handler = 0x0, _head = {prev = 0x0, next = 0x0}, _priority_head = {prev = 0x0, next = 0x0}, priority = 6 '\006'}
        event = 0x0
        sched = 0x7fd78c000e80
        task = 0x7fd7bc1949d0
        u_conn = 0x7fd78c0205c0
        ins = 0x25d360f0
        out_flush = 0x7fd78c018cd0
        th_ins = 0x7fd7bc176050
        params = 0x0
        sched_params = 0x0
        dns_ctx = {lookups = {prev = 0x7fd79dbb9b70, next = 0x7fd79dbb9b70}, lookups_drop = {prev = 0x7fd79dbb9b80, next = 0x7fd79dbb9b80}}
        notification = 0x0
#4  0x000000000057ca77 in step_callback (data=0x7fd7bc176660) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x7fd7bc176660
#5  0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 0, 140564626524272, 140564626524535, 2287439001155932697, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fd79d3c26c0 (LWP 1218468))]
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=92, events=0x7fd7bc17cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=92, events=0x7fd7bc17cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc129310, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7fd7bc0a6460
        ret = 0
#2  0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc129310) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x000000000185be78 in mk_lib_worker (data=0x7fd7bc0ada10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_lib.c:154
        fd = 824195691
        bytes = 1667331187
        val = 2334097595223798896
        server = 0x7fd7bc17cbb0
        event = 0x0
        ctx = 0x7fd7bc0ada10
        __i = 32727
        __ctx = 0x383634383132
#4  0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 0, 140564626524704, 140564626524967, 2287437901107434009, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
gdb) thread 6
[Switching to thread 6 (Thread 0x7fd79323d6c0 (LWP 1218470))]
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=97, events=0x7fd780000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=97, events=0x7fd780000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd780001b10, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7fd780000ee0
        ret = 0
#2  0x0000000001875c14 in mk_event_wait (loop=0x7fd780001b10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x000000000186e408 in mk_server_worker_loop (server=0x7fd7bc17cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_server.c:506
        __i = 1
        __ctx = 0x7fd780000ee0
        ret = 0
        timeout_fd = 103
        val = 1
        event = 0x0
        evl = 0x7fd780001b10
        list = 0x7fd7800063e0
        head = 0x7fd7800063e0
        conn = 0x7fd793233da0
        sched = 0x7fd784000b90
        listener = 0x7fd780006400
        server_timeout = 0x7fd78000e530
        __i = 0
        __ctx = 0x7fd780000ee0
#4  0x00000000018644b1 in mk_sched_launch_worker_loop (data=0x7fd784001390) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_scheduler.c:417
        ret = 0
        wid = 0
        len = 13
        thread_name = 0x7fd780006390 "VWz}\320\177"
        head = 0x7fd7bc17cdf0
        wcb = 0x7fd7bc0ee940
        sched = 0x7fd784000b90
        notif = 0x7fd780006340
        thinfo = 0x7fd784001390
        ctx = 0x7fd784000b70
        server = 0x7fd7bc17cbb0
#5  0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 2, 140564032621376, 140564032621639, 2287407049820475929, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 7
[Switching to thread 7 (Thread 0x7fd7c0a266c0 (LWP 1218464))]
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=8, events=0x7fd7bc000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007fd7c0b2caf6 in epoll_wait (epfd=8, events=0x7fd7bc000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018757ea in _mk_event_wait_2 (loop=0x7fd7bc0017a0, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7fd7bc000b70
        ret = 0
#2  0x0000000001875c14 in mk_event_wait (loop=0x7fd7bc0017a0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x0000000000549d30 in flb_engine_start (config=0x25d14490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_engine.c:999
        __flb_event_priority_live_foreach_iter = 3
        __flb_event_priority_live_foreach_n_events = 0
        ret = 0
        tasks = 0
        fs_chunks = 0
        mem_chunks = 0
        ts = 0
        tmp = "24.0K\000\000\000\000\000\000\000\000\000\000"
        rb_flush_flag = 0
        t_flush = {tm = {tv_sec = 10, tv_nsec = 0}}
        event = 0x0
        evl = 0x7fd7bc0017a0
        evl_bktq = 0x7fd7bc005fd0
        sched = 0x7fd7bc01b610
        dns_ctx = {lookups = {prev = 0x7fd7c0a1cd30, next = 0x7fd7c0a1cd30}, lookups_drop = {prev = 0x7fd7c0a1cd40, next = 0x7fd7c0a1cd40}}
        notification = 0x0
        rb_ms = 250
        rb_env = 0x0
#4  0x00000000004cb8da in flb_lib_worker (data=0x25d14460) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_lib.c:835
        ret = -2043829331
        ctx = 0x25d14460
        config = 0x25d14490
#5  0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 22, 140732564934064, 140732564934327, 2287309170200154649, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 8
[Switching to thread 8 (Thread 0x7fd793a3e6c0 (LWP 1218469))]
#0  0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48        r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0  0x00007fd7c0af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 2
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007fd7c0b022e7 in __GI___nanosleep (req=req@entry=0x7fd793a34de0, rem=rem@entry=0x7fd793a34de0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007fd7c0b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
        save_errno = 0
        max = 4294967295
        ts = {tv_sec = 0, tv_nsec = 120694212}
#3  0x0000000001871f97 in mk_clock_worker_init (data=0x7fd7bc17cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_clock.c:124
        cur_time = 1761642682
        server = 0x7fd7bc17cbb0
#4  0x00007fd7c0abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2300874514650230247, -37112, 2, 140564032621424, 140564032621687, 2287408147721490969, 2287309120364220953}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007fd7c0b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

And these are the trace logs from fluent-bit right before the crash:

[2025/10/28 10:11:22.602510368] [trace] [task 0x7fd7bc1949d0] created (id=0)
[2025/10/28 10:11:22.602601222] [trace] [upstream] get new connection for receiver-server.example.org:5000, net setup:
net.connect_timeout        = 10 seconds
net.source_address         = any
net.keepalive              = enabled
net.keepalive_idle_timeout = 30 seconds
net.max_worker_connections = 0
[2025/10/28 10:11:22.602518892] [debug] [task] created task=0x7fd7bc1949d0 id=0 OK
[2025/10/28 10:11:22.602611674] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 has been assigned (recycled)
[2025/10/28 10:11:22.602528676] [debug] [output:http:http.0] task_id=0 assigned to thread #0
[2025/10/28 10:11:22.602619261] [debug] [http_client] not using http_proxy for header
[2025/10/28 10:11:22.602630663] [trace] [io coro=0x7fd78c02c660] [net_write] trying 161 bytes
[2025/10/28 10:11:22.602728194] [trace] [io coro=0x7fd78c02c660] [net_write] ret=161 total=161/161
[2025/10/28 10:11:22.602733808] [trace] [io coro=0x7fd78c02c660] [net_write] trying 1464 bytes
[2025/10/28 10:11:22.602752129] [trace] [io coro=0x7fd78c02c660] [net_write] ret=1464 total=1464/1464
[2025/10/28 10:11:22.602757296] [trace] [io coro=0x7fd78c02c660] [net_read] try up to 4095 bytes
[2025/10/28 10:11:22.603752273] [trace] [engine] resuming coroutine=0x7fd78c02c660
[2025/10/28 10:11:22.603982679] [trace] [io coro=0x7fd78c02c660] [net_read] ret=66
[2025/10/28 10:11:22.604189099] [ info] [output:http:http.0] receiver-server.example.org:5000, HTTP status=200
ok
[2025/10/28 10:11:22.604246406] [debug] [upstream] KA connection #108 to receiver-server.example.org:5000 is now available
[2025/10/28 10:11:22.604315277] [debug] [out flush] cb_destroy coro_id=213
[2025/10/28 10:11:22.604716134] [trace] [coro] destroy coroutine=0x7fd78c02c660 data=0x7fd78c02c680
[2025/10/28 10:11:22.604803526] [trace] [engine] [task event] task_id=0 out_id=0 return=OK
[2025/10/28 10:11:22.604822432] [debug] [task] destroy task=0x7fd7bc1949d0 (task_id=0)
[2025/10/28 10:11:22.604832772] [trace] [1097] http.0 -> fs_chunks_size = 4096 mod=-4096 chunk=1218463-1761642682.352458583.flb
[2025/10/28 10:11:22.604838770] [debug] [input chunk] remove chunk 1218463-1761642682.352458583.flb with 4096 bytes from plugin http.0, the updated fs_chunks_size is 0 bytes
[2025/10/28 10:11:22.852617555] [trace] [565] http.0 -> fs_chunks_size = 0
[2025/10/28 10:11:22.852642469] [trace] [input chunk] chunk 1218463-1761642682.852413803.flb required 1874 bytes and 100000000 bytes left in plugin http.0
[2025/10/28 10:11:22.852733321] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/28 10:11:22.852780064] [trace] [filter:modify:modify.0 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 31 elements, output map size 38 elements
[2025/10/28 10:11:22.852805968] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/28 10:11:22.852830711] [trace] [filter:nest:nest.1 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 38, nested map size will be 1
[2025/10/28 10:11:22.852862776] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/28 10:11:22.852880267] [trace] [filter:nest:nest.2 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 36, nested map size will be 3
[2025/10/28 10:11:22.852906761] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/28 10:11:22.852924420] [trace] [filter:nest:nest.3 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 36, will be 35, nested map size will be 2
[2025/10/28 10:11:22.852951223] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/28 10:11:22.852973729] [trace] [filter:nest:nest.4 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 35, will be 35, nested map size will be 1
[2025/10/28 10:11:22.853006485] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/28 10:11:22.853023176] [trace] [filter:modify:modify.5 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_modify/modify.c:1430] Input map size 35 elements, output map size 36 elements
[2025/10/28 10:11:22.853085179] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/28 10:11:22.853102940] [trace] [filter:nest:nest.7 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 41, will be 40, nested map size will be 2
[2025/10/28 10:11:22.853130298] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/28 10:11:22.853150292] [trace] [filter:nest:nest.8 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 40, will be 38, nested map size will be 3
[2025/10/28 10:11:22.853180785] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
[2025/10/28 10:11:22.853199879] [trace] [filter:nest:nest.9 at /root/rhel10-test/fluent-bit-4.1.1/plugins/filter_nest/nest.c:551] outer map size is 38, will be 37, nested map size will be 2
[2025/10/28 10:11:22.853229995] [trace] [input chunk] update output instances with new chunk size diff=4096, records=2, input=systemd.0
[2025/10/28 10:11:22.853236325] [trace] [2226] http.0 -> fs_chunks_size = 0 mod=4096 chunk=1218463-1761642682.852413803.flb
[2025/10/28 10:11:22.853240997] [trace] [input chunk] chunk 1218463-1761642682.852413803.flb update plugin http.0 fs_chunks_size by 4096 bytes, the current fs_chunks_size is 4096 bytes

rafaelma avatar Oct 28 '25 14:10 rafaelma

It seems to be related to having a bundled version of zstd, which conflicts when the external library uses the system’s zstd.

@rafaelma I have added another change to the PR/branch, please give it a try, thanks again for your help and patience on this.

edsiper avatar Oct 28 '25 21:10 edsiper

Hello @edsiper, I am sorry for bringing bad news once again. We experience a new core dump crash with the new patch when these multiline logs from selinux are generated. The positive aspect is that we have identified when these logs are created, allowing us to provoke a crash at will, rather than waiting for one to occur unexpectedly.

Here you have the new backtraces. Tell me if you need some other information. Thank you very much for looking at this.

(gdb) info threads 
  Id   Target Id                           Frame 
* 1    Thread 0x7f043f7fe6c0 (LWP 1298275) __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  2    Thread 0x7f0444a266c0 (LWP 1298273) 0x00007f0444b2caf6 in epoll_wait (epfd=8, events=0x7f0440000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7f043ffff6c0 (LWP 1298274) 0x00007f0444b2caf6 in epoll_wait (epfd=11, events=0x7f04400061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  4    Thread 0x7f0415c1d6c0 (LWP 1298279) 0x00007f0444b2caf6 in epoll_wait (epfd=97, events=0x7f0404000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  5    Thread 0x7f04455157c0 (LWP 1298272) 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  6    Thread 0x7f041dc406c0 (LWP 1298276) 0x00007f0444b2caf6 in epoll_wait (epfd=75, events=0x7f0440176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  7    Thread 0x7f041d43f6c0 (LWP 1298277) 0x00007f0444b2caf6 in epoll_wait (epfd=91, events=0x7f044017cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  8    Thread 0x7f041641e6c0 (LWP 1298278) 0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
(gdb) backtrace full
                              #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
        tid = <optimized out>                                    
        ret = 0                                                                                                                                                                                                                                                                           
        pd = <optimized out>
        old_mask = {__val = {0}}
        ret = <optimized out>
#1  0x00007f0444abda43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.      
#2  0x00007f0444a67a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>
#3  0x00007f0444a4f8fa in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 773, 0, 139656096692832, 139656221895408, 139656221891392, 4912552, 0, 139656322940928, 4878989, 4878914, 32155344}}, sa_flags = 0, sa_restorer = 0x7f043808ba60}
#4  0x00000000004a947d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
        cf_opts = 0x0                          
#5  <signal handler called>                                                                                                                                                                                                                                                               No locals.                                                                                                                                                                                                                                                                                
#6  0x00000000011cbe84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
        cMem = {customAlloc = 0x0, customFree = 0x1, opaque = 0x7f043f7f2af0}
#7  0x00007f0444fad609 in ZSTD_clearDict (dctx=0x7f043808ba60) at .//decompress/zstd_decompress.c:315
No locals.                   
#8  ZSTD_freeDCtx (dctx=0x7f043808ba60) at .//decompress/zstd_decompress.c:326
        cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9  0x00007f04457d2b09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
        __func__ = <optimized out>                                                                                                           
#10 decompress_blob_zstd (dst_max=0, src=0x7f0429f06908, src_size=420, dst=<optimized out>, dst_size=0x7f043f7f2978) at ../src/basic/compress.c:451
        k = 0                     
        size = 773                 
        r = <optimized out>
        dctx = 0x7f043808ba60                                                                                                                                                                                                                                                             
        input = {src = 0x7f0429f06908, size = 420, pos = 420}
        output = {dst = 0x7f043804ba50, size = 262152, pos = 773}                                                                            
        __func__ = <optimized out>                
        size = <optimized out>                        
        r = <optimized out>
        dctx = <optimized out>
        input = <optimized out>                                                                                                                                                                                                                                                           
        output = <optimized out>
        k = <optimized out>   
        _found = <optimized out>
        __assert_in_set = <optimized out>
        __unique_prefix_A18 = <optimized out>
        __unique_prefix_B19 = <optimized out>
        _level = <optimized out>
        _e = <optimized out>                                                                                                                 
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.                                                                                                                                   
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x7f0438016d90, payload=0x7f0429f06908 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7f043f7f2af0, ret_size=0x7f043f7f2ae8)
    at ../src/libsystemd/sd-journal/journal-file.c:1947                                                                                      
        rsize = 773          
        r = <optimized out> 
        __func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x7f0438016d90, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7f043f7f2af0, ret_size=0x7f043f7f2ae8, data_threshold=<optimized out>)                                            
    at ../src/libsystemd/sd-journal/journal-file.c:2009
        size = 420                                                                                                                           
        c = <optimized out>
        r = <optimized out>
        __func__ = <optimized out>
#14 0x00007f044572fd33 in sd_journal_enumerate_data (j=0x7f0438001140, data=0x7f043f7f2b70, size=0x7f043f7f4b98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
        _e = <optimized out>
        p = <optimized out>
        d = 0x0
        l = 0
        _error = <optimized out>
        _level = <optimized out>
        n = <optimized out>
        f = 0x7f0438016d90
        o = 0x7f04259c7c08
        r = <optimized out>
        __func__ = "sd_journal_enumerate_data"
--Type <RET> for more, q to quit, c to continue without paging--c
#15 0x000000000074170a in in_systemd_collect (ins=0xe884210, config=0xe868490, in_context=0x7f0438001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
        ret = 0
        ret_j = 1
        entries = 16
        skip_entries = 0
        rows = 1
        sec = 1761733332
        nsec = 688426000
        usec = 1761733332688426
        length = 37
        key = 0x7f0429f06700 "_SYSTEMD_UNIT=setroubleshootd.service"
        cursor = 0x0
        tag = 0xe883ff0 "uio_logs_5000_systemd"
        new_tag = "\260L\177?\004\177\000\000\257\306L\000\000\000\000\000\320;\177?\004\177\000\000\210-\210\001\000\000\000\000\220*\210\001\000\000\000\000\227\003\000\000\005\000\000\000Ǔ\207\001\000\000\000\000(\000\000\0000\000\000\000\300L\177?\004\177\000\000\000L\177?\004\
177\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/29 11:22:12.604911191\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
        last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
        tag_len = 21
        last_tag_len = 21
        data = 0x7f0429f06700
        ctx = 0x7f0438001090
        tm = {tm = {tv_sec = 1761733332, tv_nsec = 688426000}}
        kvlist = 0x7f04380092c0
#16 0x00000000004fffcd in input_collector_fd (fd=40, ins=0xe884210) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
        head = 0x7f0438007c58
        collector = 0x7f0438007bd0
        input_coro = 0x7f043f7f4d00
        config = 0xe868490
#17 0x0000000000500b0a in engine_handle_event (fd=40, mask=1, ins=0xe884210, config=0xe868490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
        ret = 0
#18 input_thread (data=0x7f044001f360) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
        __flb_event_priority_live_foreach_iter = 0
        __flb_event_priority_live_foreach_n_events = 1
        ret = 0
        thread_id = 0
        tmp = "flb-in-systemd.0-w0\000\000\000\000\000c\317\023\000\000\000\000\000\300\346\177?\004\177\000\000GɡD\004\177", '\000' <repeats 17 times>
        instance_exit = 0
        event = 0x7f0438007bd0
        ins = 0xe884210
        evl_bktq = 0x7f0438007ba0
        thi = 0x7f044001f360
        p = 0xe875560
        sched = 0x7f0438000b70
        dns_ctx = {lookups = {prev = 0x7f043f7f4d30, next = 0x7f043f7f4d30}, lookups_drop = {prev = 0x7f043f7f4d40, next = 0x7f043f7f4d40}}
        notification = 0x7f0444a87e5e <__GI___snprintf+158>
#19 0x0000000000577a77 in step_callback (data=0x7f04400249f0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x7f04400249f0
#20 0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 2, 139656308049984, 139656308050247, -8085596272872313471, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#21 0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f0444a266c0 (LWP 1298273))]
#0  0x00007f0444b2caf6 in epoll_wait (epfd=8, events=0x7f0440000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007f0444b2caf6 in epoll_wait (epfd=8, events=0x7f0440000b90, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018707aa in _mk_event_wait_2 (loop=0x7f04400017a0, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7f0440000b70
        ret = 0
#2  0x0000000001870bd4 in mk_event_wait (loop=0x7f04400017a0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x0000000000544d30 in flb_engine_start (config=0xe868490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_engine.c:999
        __flb_event_priority_live_foreach_iter = 1
        __flb_event_priority_live_foreach_n_events = 0
        ret = 0
        tasks = 0
        fs_chunks = 0
        mem_chunks = 0
        ts = 0
        tmp = "24.0K\000\000\000\000\000\000\000\000\000\000"
        rb_flush_flag = 0
        t_flush = {tm = {tv_sec = 10, tv_nsec = 0}}
        event = 0x0
        evl = 0x7f04400017a0
        evl_bktq = 0x7f0440005fd0
        sched = 0x7f044001b610
        dns_ctx = {lookups = {prev = 0x7f0444a1cd30, next = 0x7f0444a1cd30}, lookups_drop = {prev = 0x7f0444a1cd40, next = 0x7f0444a1cd40}}
        notification = 0x0
        rb_ms = 250
        rb_env = 0x0
#4  0x00000000004c68da in flb_lib_worker (data=0xe868460) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_lib.c:835
        ret = -1975923953
        ctx = 0xe868460
        config = 0xe868490
#5  0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 22, 140737304904432, 140737304904695, -8085431676840628863, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

(gdb) thread 3
[Switching to thread 3 (Thread 0x7f043ffff6c0 (LWP 1298274))]
#0  0x00007f0444b2caf6 in epoll_wait (epfd=11, events=0x7f04400061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007f0444b2caf6 in epoll_wait (epfd=11, events=0x7f04400061a0, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0440006330, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7f0440006180
        ret = 0
#2  0x0000000001870bd4 in mk_event_wait (loop=0x7f0440006330) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x00000000004cb1a4 in log_worker_collector (data=0x7f0440006090) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_log.c:166
        __i = 1
        __ctx = 0x7f0440006180
        run = 1
        event = 0x0
        log = 0x7f0440006090
        signal_value = 2
#4  0x0000000000577a77 in step_callback (data=0x7f0440009fa0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x7f0440009fa0
#5  0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 0, 139656308050416, 139656308050679, -8085597372920812159, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.


(gdb) thread 4
[Switching to thread 4 (Thread 0x7f0415c1d6c0 (LWP 1298279))]
#0  0x00007f0444b2caf6 in epoll_wait (epfd=97, events=0x7f0404000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007f0444b2caf6 in epoll_wait (epfd=97, events=0x7f0404000f00, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0404001b10, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7f0404000ee0
        ret = 0
#2  0x0000000001870bd4 in mk_event_wait (loop=0x7f0404001b10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x00000000018693c8 in mk_server_worker_loop (server=0x7f044017cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_server.c:506
        __i = 1
        __ctx = 0x7f0404000ee0
        ret = 0
        timeout_fd = 103
        val = 1
        event = 0x0
        evl = 0x7f0404001b10
        list = 0x7f04040063e0
        head = 0x7f04040063e0
        conn = 0x7f0415c13da0
        sched = 0x7f0408000b90
        listener = 0x7f0404006400
        server_timeout = 0x7f040400e530
        __i = 0
        __ctx = 0x7f0404000ee0
#4  0x000000000185f471 in mk_sched_launch_worker_loop (data=0x7f0408001390) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_scheduler.c:417
        ret = 0
        wid = 0
        len = 13
        thread_name = 0x7f0404006390 "6\\A\364\003\177"
        head = 0x7f044017cdf0
        wcb = 0x7f04400ee940
        sched = 0x7f0408000b90
        notif = 0x7f0404006340
        thinfo = 0x7f0408001390
        ctx = 0x7f0408000b70
        server = 0x7f044017cbb0
#5  0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 2, 139655647550272, 139655647550535, -8085539699026219647, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f04455157c0 (LWP 1298272))]
#0  0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48        r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0  0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007f0444b022e7 in __GI___nanosleep (req=req@entry=0x7ffff510c5a0, rem=rem@entry=0x7ffff510c5a0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007f0444b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
        save_errno = 22
        max = 4294967295
        ts = {tv_sec = 0, tv_nsec = 801045997}
#3  0x00000000004aab40 in flb_main_run (argc=3, argv=0x7ffff510c828) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1469
        opt = -1
        ret = 0
        json = 0x21000 <error: Cannot access memory at address 0x21000>
        last_plugin = -1
        cfg_file = 0xe86dc60 <incomplete sequence \350>
        cf = 0xe868df0
        tmp = 0xe868df0
        service = 0xe8683b0
        s = 0x5b4a46 <flb_supervisor_requested+213>
        section = 0xe8683b0
        cf_opts = 0xe8682b0
        group = 0x8a39cb0fdf26c000
        supervisor_reload_notified = 0
        trace_input = 0x0
        trace_output = 0xe868440 "stdout"
        trace_props = 0x0
        long_opts = {{name = 0x18773b8 "storage_path", has_arg = 1, flag = 0x0, val = 98}, {name = 0x18773c5 "config", has_arg = 1, flag = 0x0, val = 99}, {name = 0x1877132 "daemon", has_arg = 0, flag = 0x0, val = 100}, {name = 0x18773cc "dry-run", has_arg = 0, flag = 0x0, 
            val = 68}, {name = 0x1877139 "flush", has_arg = 1, flag = 0x0, val = 102}, {name = 0x18773d4 "http", has_arg = 0, flag = 0x0, val = 72}, {name = 0x18773d9 "supervisor", has_arg = 0, flag = 0x0, val = 1029}, {name = 0x187716a "log_file", has_arg = 1, flag = 0x0, 
            val = 108}, {name = 0x18773e4 "port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x187713f "custom", has_arg = 1, flag = 0x0, val = 67}, {name = 0x18770fe "input", has_arg = 1, flag = 0x0, val = 105}, {name = 0x1877159 "processor", has_arg = 1, flag = 0x0, 
            val = 114}, {name = 0x1877163 "filter", has_arg = 1, flag = 0x0, val = 70}, {name = 0x1877104 "output", has_arg = 1, flag = 0x0, val = 111}, {name = 0x1877146 "match", has_arg = 1, flag = 0x0, val = 109}, {name = 0x18773e9 "parser", has_arg = 1, flag = 0x0, val = 82}, 
          {name = 0x18773f0 "prop", has_arg = 1, flag = 0x0, val = 112}, {name = 0x18773f5 "plugin", has_arg = 1, flag = 0x0, val = 101}, {name = 0x1877173 "tag", has_arg = 1, flag = 0x0, val = 116}, {name = 0x18773fc "sp-task", has_arg = 1, flag = 0x0, val = 84}, {
            name = 0x1877404 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0x187740c "verbose", has_arg = 0, flag = 0x0, val = 118}, {name = 0x1877414 "workdir", has_arg = 1, flag = 0x0, val = 119}, {name = 0x187741c "quiet", has_arg = 0, flag = 0x0, val = 113}, {
            name = 0x1877422 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x1877427 "help-json", has_arg = 0, flag = 0x0, val = 74}, {name = 0x1877431 "coro_stack_size", has_arg = 1, flag = 0x0, val = 115}, {name = 0x1877441 "sosreport", has_arg = 0, flag = 0x0, 
            val = 83}, {name = 0x1877177 "http_server", has_arg = 0, flag = 0x0, val = 72}, {name = 0x187744b "http_listen", has_arg = 1, flag = 0x0, val = 76}, {name = 0x1877457 "http_port", has_arg = 1, flag = 0x0, val = 80}, {name = 0x1877461 "enable-hot-reload", has_arg = 0, 
            flag = 0x0, val = 89}, {name = 0x1877473 "enable-chunk-trace", has_arg = 0, flag = 0x0, val = 90}, {name = 0x1877486 "trace", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x187748c "trace-input", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x1877498 "trace-output", 
            has_arg = 1, flag = 0x0, val = 1027}, {name = 0x18774a5 "trace-output-property", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x18774c0 "disable-thread-safety-on-hot-reload", has_arg = 0, flag = 0x0, val = 87}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}}
#4  0x00000000005b4acf in flb_supervisor_run (argc=3, argv=0x7ffff510c828, entry=0x4a9d5a <flb_main_run>) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_supervisor.c:626
        clean_argv = 0x11ff
        clean_argc = 32516
        env_child = 0x6c6f6f705f68652e <error: Cannot access memory at address 0x6c6f6f705f68652e>
        ret = -1
#5  0x00000000004aadcb in flb_main (argc=3, argv=0x7ffff510c828) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1564
No locals.
#6  0x00000000004aaded in main (argc=3, argv=0x7ffff510c828) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:1572
No locals.
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f041dc406c0 (LWP 1298276))]
#0  0x00007f0444b2caf6 in epoll_wait (epfd=75, events=0x7f0440176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007f0444b2caf6 in epoll_wait (epfd=75, events=0x7f0440176330, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0440176640, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7f0440176310
        ret = 0
#2  0x0000000001870bd4 in mk_event_wait (loop=0x7f0440176640) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x0000000000516d74 in output_thread (data=0x7f0440176050) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_output_thread.c:257
        __flb_event_priority_live_foreach_iter = 1
        __flb_event_priority_live_foreach_n_events = 0
        n = 8
        ret = 0
        running = 1
        stopping = 0
        thread_id = 0
        tmp = "flb-out-http.0-w0", '\000' <repeats 32 times>, "1298276\000\300&\337\017\3139\212"
        event_local = {fd = 89, type = 65536, mask = 1, status = 2 '\002', data = 0x0, handler = 0x0, _head = {prev = 0x0, next = 0x0}, _priority_head = {prev = 0x0, next = 0x0}, priority = 6 '\006'}
        event = 0x0
        sched = 0x7f0410000e80
        task = 0x7f044019c3c0
        u_conn = 0x7f041002e030
        ins = 0xe88a0f0
        out_flush = 0x7f0410017ea0
        th_ins = 0x7f0440176050
        params = 0x0
        sched_params = 0x0
        dns_ctx = {lookups = {prev = 0x7f041dc36b70, next = 0x7f041dc36b70}, lookups_drop = {prev = 0x7f041dc36b80, next = 0x7f041dc36b80}}
        notification = 0x0
#4  0x0000000000577a77 in step_callback (data=0x7f0440176660) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x7f0440176660
#5  0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 0, 139656308050032, 139656308050295, -8085522091270918783, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#6  0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 7
[Switching to thread 7 (Thread 0x7f041d43f6c0 (LWP 1298277))]
#0  0x00007f0444b2caf6 in epoll_wait (epfd=91, events=0x7f044017cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
30        return SYSCALL_CANCEL (epoll_wait, epfd, events, maxevents, timeout);
(gdb) backtrace full
#0  0x00007f0444b2caf6 in epoll_wait (epfd=91, events=0x7f044017cf10, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
#1  0x00000000018707aa in _mk_event_wait_2 (loop=0x7f0440129310, timeout=-1) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event_epoll.c:444
        ctx = 0x7f04400a6460
        ret = 0
#2  0x0000000001870bd4 in mk_event_wait (loop=0x7f0440129310) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_core/mk_event.c:207
No locals.
#3  0x0000000001856e38 in mk_lib_worker (data=0x7f04400ada10) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_lib.c:154
        fd = 824195691
        bytes = 1667331187
        val = 2334097595223798896
        server = 0x7f044017cbb0
        event = 0x0
        ctx = 0x7f04400ada10
        __i = 32516
        __ctx = 0x373732383932
#4  0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 0, 139656308050464, 139656308050727, -8085520991222420095, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.
(gdb) thread 8
[Switching to thread 8 (Thread 0x7f041641e6c0 (LWP 1298278))]
#0  0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
48        r = INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags, req,
(gdb) backtrace full
#0  0x00007f0444af6945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
        sc_cancel_oldtype = 2
        sc_ret = <optimized out>
        r = <optimized out>
#1  0x00007f0444b022e7 in __GI___nanosleep (req=req@entry=0x7f0416414de0, rem=rem@entry=0x7f0416414de0) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
        ret = <optimized out>
#2  0x00007f0444b1451c in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
        save_errno = 0
        max = 4294967295
        ts = {tv_sec = 0, tv_nsec = 804197386}
#3  0x000000000186cf57 in mk_clock_worker_init (data=0x7f044017cbb0) at /root/rhel10-test/fluent-bit-4.1.1/lib/monkey/mk_server/mk_clock.c:124
        cur_time = 1761733333
        server = 0x7f044017cbb0
#4  0x00007f0444abbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, 8197202206575373697, -37112, 2, 139655647550320, 139655647550583, -8085540799074718335, -8085431727785431679}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007f0444b2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

rafaelma avatar Oct 29 '25 20:10 rafaelma

Hei @edsiper

I have run fluent-bit with valgrind and triggered a crash, I hope this new information will help debugging this issue. Good luck:

Here is the logfile generated by valgrind for this crash: valgrind.log

And here the backtrace from this crash:

  • ZSTD_freeDDict is called with ddict=0x1, this doesn't look valid
  • There is a different from the previous core dump (where customAlloc was 0x0 and customFree was 0x1). Now we have: customAlloc = 0x1102, customFree = 0x40000, opaque = 0x1102 (which also looks invalid?)

Given that the issue is recurring and the invalid pointers are different, it's likely that the decompression context (dctx) is being used after it has been freed (use-after-free) or the memory is being overwritten by something else?

(gdb) info threads          
  Id   Target Id                       Frame 
* 1    Thread 0x70a36c0 (LWP 1373684)  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
  2    Thread 0x2adc16c0 (LWP 1373689) 0x000000000538daf6 in epoll_wait (epfd=98, events=0xafaeab0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x2a5c06c0 (LWP 1373688) 0x0000000005357945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x2a5b6de0, rem=rem@entry=0x2a5b6de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  4    Thread 0x29dbf6c0 (LWP 1373687) 0x000000000538daf6 in epoll_wait (epfd=92, events=0xaf9a110, maxevents=8, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  5    Thread 0x295be6c0 (LWP 1373686) 0x000000000538daf6 in epoll_wait (epfd=75, events=0xaf8c320, maxevents=64, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  6    Thread 0x68a26c0 (LWP 1373683)  0x000000000538daf6 in epoll_wait (epfd=11, events=0x551bb20, maxevents=32, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  7    Thread 0x60a16c0 (LWP 1373682)  0x000000000538daf6 in epoll_wait (epfd=8, events=0x55163b0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  8    Thread 0x549ce40 (LWP 1373681)  0x0000000005357945 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x1fff000080, rem=rem@entry=0x1fff000080) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
(gdb) backtrace full                          
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44                                                                                                                                                             
        tid = <optimized out>                                                                                                                                                                                                                                                             
        ret = 0
        pd = <optimized out>
        old_mask = {__val = {0}}
        ret = <optimized out>
#1  0x000000000531ea43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.              
#2  0x00000000052c8a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>  
#3  0x00000000052b08fa in __GI_abort () at abort.c:79
        save_stage = 1                                          
        act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 0, 0, 720081008, 118061808, 118057200, 4912552, 0, 75857920, 4878989, 4878914, 32155344}}, sa_flags = 0, sa_restorer = 0x2aeb9070}
#4  0x00000000004a947d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
        cf_opts = 0x0                                                                                                                                                                                                                                                                     #5  <signal handler called>                                                                                                                                                                                                                                                               
No locals.                                                                                                                                   
#6  0x00000000011cbe84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
        cMem = {customAlloc = 0x1102, customFree = 0x40000, opaque = 0x1102}
#7  0x000000000a50a609 in ZSTD_clearDict (dctx=0x2aeb9070) at .//decompress/zstd_decompress.c:315
No locals.             
#8  ZSTD_freeDCtx (dctx=0x2aeb9070) at .//decompress/zstd_decompress.c:326
        cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9  0x0000000004953b09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74                                      
        __func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x169251d0, src_size=420, dst=<optimized out>, dst_size=0x7097978) at ../src/basic/compress.c:451
        k = 0                 
        size = 773        
        r = <optimized out>                                                                                                                                                                                                                                                               
        dctx = 0x2aeb9070
        input = {src = 0x169251d0, size = 420, pos = 420}                                                                                    
        output = {dst = 0x2ae79030, size = 262144, pos = 773}
        __func__ = <optimized out>                    
        size = <optimized out>
        r = <optimized out>
        dctx = <optimized out>                                                                                                                                                                                                                                                            
        input = <optimized out>
        output = <optimized out>
        k = <optimized out>
        _found = <optimized out>
        __assert_in_set = <optimized out>
        __unique_prefix_A18 = <optimized out>
        __unique_prefix_B19 = <optimized out>
        _level = <optimized out>                                                                                                             
        _e = <optimized out>                          
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495                                                                                   
No locals.                                                                                                                                                                                                                                                                                
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x556d210, payload=0x169251d0 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7097af0, ret_size=0x7097ae8)
    at ../src/libsystemd/sd-journal/journal-file.c:1947
        rsize = 773         
        r = <optimized out>  
        __func__ = <optimized out>                                                                                                                                                                                                                                                        
#13 journal_file_data_payload.isra.0 (f=0x556d210, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7097af0, ret_size=0x7097ae8, data_threshold=<optimized out>) at ../src/libsystemd/sd-journal/journal-file.c:2009
        size = 420                                                                                                                           
        c = <optimized out>
        r = <optimized out>
        __func__ = <optimized out>
#14 0x00000000048b0d33 in sd_journal_enumerate_data (j=0x554bf60, data=0x7097b70, size=0x7099b98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
        _e = <optimized out>
        p = <optimized out>
        d = 0x0
        l = 0
        _error = <optimized out>
        _level = <optimized out>
        n = <optimized out>
        f = 0x556d210
        o = 0x189dca50
        r = <optimized out>
        __func__ = "sd_journal_enumerate_data"
#15 0x000000000074170a in in_systemd_collect (ins=0x54ffe80, config=0x54b33b0, in_context=0x554bda0) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
--Type <RET> for more, q to quit, c to continue without paging--                                                                              
        ret = 0
        ret_j = 1
        entries = 16
        skip_entries = 0
        rows = 1
        sec = 1761825735
        nsec = 725945000
        usec = 1761825735725945
        length = 37
        key = 0x16924fc8 "_SYSTEMD_UNIT=setroubleshootd.service"
        cursor = 0x0
        tag = 0x550e810 "uio_logs_5000_systemd"
        new_tag = "\260\234\t\a\000\000\000\000\257\306L\000\000\000\000\000Ћ\t\a\000\000\000\000\210-\210\001\000\000\000\000\220*\210\001\000\000\000\000\227\003\000\000\005\000\000\000Ǔ\207\001\000\000\000\000(\000\000\0000\000\000\000\300\234\t\a\000\000\000\000\000\234\t\a\000
\000\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/30 13:02:15.656937075\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
        last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
        tag_len = 21
        last_tag_len = 21
        data = 0x16924fc8
        ctx = 0x554bda0
        tm = {tm = {tv_sec = 1761825735, tv_nsec = 725945000}}
        kvlist = 0x15201940
#16 0x00000000004fffcd in input_collector_fd (fd=40, ins=0x54ffe80) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
        head = 0xb026ef8
        collector = 0xb026e70
        input_coro = 0x7099d00
        config = 0x54b33b0
#17 0x0000000000500b0a in engine_handle_event (fd=40, mask=1, ins=0x54ffe80, config=0x54b33b0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
        ret = 0
#18 input_thread (data=0x5542d20) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
        __flb_event_priority_live_foreach_iter = 0
        __flb_event_priority_live_foreach_n_events = 1
        ret = 0
        thread_id = 0
        tmp = "flb-in-systemd.0-w0\000\000\000\000\000\364\365\024\000\000\000\000\000\3006\n\a\000\000\000\000\3006\n\a", '\000' <repeats 19 times>
        instance_exit = 0
        event = 0xb026e70
        ins = 0x54ffe80
        evl_bktq = 0xafd8c20
        thi = 0x5542d20
        p = 0x54d1020
        sched = 0x554b790
        dns_ctx = {lookups = {prev = 0x7099d30, next = 0x7099d30}, lookups_drop = {prev = 0x7099d40, next = 0x7099d40}}
        notification = 0x52e8e5e <__GI___snprintf+158>
#19 0x0000000000577a77 in step_callback (data=0x5548520) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
        worker = 0x5548520
#20 0x000000000531cb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {118109888, 5951356893848596964, -37112, 2, 101283904, 101284167, 5951359036303274468, 5951363918013067748}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#21 0x000000000538d4e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
No locals.

rafaelma avatar Oct 30 '25 12:10 rafaelma

Hi, I tried to use system's libzstd and linking it for unifying the using libzstd between in_systemd plugin and the system provided libsystemd. This could unify the symbols for using systemd related code. https://github.com/fluent/fluent-bit/pull/11088 This could eliminate your issue?

cosmo0920 avatar Oct 31 '25 10:10 cosmo0920

Hi, I tried to use system's libzstd and linking it for unifying the using libzstd between in_systemd plugin and the system provided libsystemd. This could unify the symbols for using systemd related code. #11088 This could eliminate your issue?

@cosmo0920 Should this patch be used in addition to the ones sent by @edsiper, or instead of them?

rafaelma avatar Oct 31 '25 11:10 rafaelma

I have applied your patch @cosmo0920 in addition to the patches from @edsiper and recompiled with the same result:

Here is the backtrace for this crash:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
       tid = <optimized out>
       ret = 0
       pd = <optimized out>
       old_mask = {__val = {0}}
       ret = <optimized out>
#1  0x00007fe781ebda43 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
No locals.
#2  0x00007fe781e67a96 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
       ret = <optimized out>
#3  0x00007fe781e4f8fa in __GI_abort () at abort.c:79
       save_stage = 1
       act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0, 0, 0, 0, 0, 773, 0, 140632060825456, 140632276437744, 140632276433728, 4912552, 0, 140632306647040, 4878989, 4878914, 32155344}}, sa_flags = 0, sa_restorer = 0x7fe774078f70}
#4  0x00000000004a947d in flb_signal_handler (signal=11) at /root/rhel10-test/fluent-bit-4.1.1/src/fluent-bit.c:636
       cf_opts = 0x0
#5  <signal handler called>
No locals.
#6  0x00000000011cbe84 in ZSTD_freeDDict (ddict=0x1) at /root/rhel10-test/fluent-bit-4.1.1/lib/zstd-1.5.7/lib/decompress/zstd_ddict.c:215
       cMem = {customAlloc = 0x0, customFree = 0x1, opaque = 0x7fe780e18af0}
#7  0x00007fe779fad609 in ZSTD_clearDict (dctx=0x7fe774078f70) at .//decompress/zstd_decompress.c:315
No locals.
#8  ZSTD_freeDCtx (dctx=0x7fe774078f70) at .//decompress/zstd_decompress.c:326
       cMem = {customAlloc = <optimized out>, customFree = 0x0, opaque = 0x0}
#9  0x00007fe782a6ab09 in sym_ZSTD_freeDCtxp (p=<optimized out>) at ../src/basic/compress.c:74
       __func__ = <optimized out>
#10 decompress_blob_zstd (dst_max=0, src=0x7fe7679671d0, src_size=420, dst=<optimized out>, dst_size=0x7fe780e18978) at ../src/basic/compress.c:451
       k = 0
       size = 773
       r = <optimized out>
       dctx = 0x7fe774078f70
       input = {src = 0x7fe7679671d0, size = 420, pos = 420}
       output = {dst = 0x7fe774038f60, size = 262152, pos = 773}
       __func__ = <optimized out>
       size = <optimized out>
       r = <optimized out>
       dctx = <optimized out>
       input = <optimized out>
       output = <optimized out>
       k = <optimized out>
       _found = <optimized out>
       __assert_in_set = <optimized out>
       __unique_prefix_A18 = <optimized out>
       __unique_prefix_B19 = <optimized out>
       _level = <optimized out>
       _e = <optimized out>
#11 decompress_blob (dst_max=<optimized out>, compression=<optimized out>, src=<optimized out>, src_size=<optimized out>, dst=<optimized out>, dst_size=<optimized out>) at ../src/basic/compress.c:495
No locals.
#12 maybe_decompress_payload (data_threshold=<optimized out>, f=0x7fe774016d90, payload=0x7fe7679671d0 "(\265/\375`\005\002\325\f", size=420, compression=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fe780e18af0, ret_size=0x7fe780e18ae8)
   at ../src/libsystemd/sd-journal/journal-file.c:1947
       rsize = 773
       r = <optimized out>
       __func__ = <optimized out>
#13 journal_file_data_payload.isra.0 (f=0x7fe774016d90, o=<optimized out>, offset=<optimized out>, field=<optimized out>, field_length=<optimized out>, ret_data=0x7fe780e18af0, ret_size=0x7fe780e18ae8, data_threshold=<optimized out>)
   at ../src/libsystemd/sd-journal/journal-file.c:2009
       size = 420
       c = <optimized out>
       r = <optimized out>
       __func__ = <optimized out>
#14 0x00007fe7829c7d33 in sd_journal_enumerate_data (j=0x7fe774001140, data=0x7fe780e18b70, size=0x7fe780e1ab98) at ../src/libsystemd/sd-journal/sd-journal.c:2886
       _e = <optimized out>
       p = <optimized out>
       d = 0x0
       l = 0
       _error = <optimized out>
       _level = <optimized out>
       n = <optimized out>
       f = 0x7fe774016d90
       o = 0x7fe76339da08
       r = <optimized out>
       __func__ = "sd_journal_enumerate_data"
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x000000000074170a in in_systemd_collect (ins=0x7945210, config=0x7929490, in_context=0x7fe774001090) at /root/rhel10-test/fluent-bit-4.1.1/plugins/in_systemd/systemd.c:408
       ret = 0
       ret_j = 1
       entries = 16
       skip_entries = 0
       rows = 1
       sec = 1761911728
       nsec = 750378000
       usec = 1761911728750378
       length = 37
       key = 0x7fe767966fc8 "_SYSTEMD_UNIT=setroubleshootd.service"
       cursor = 0x0
       tag = 0x7944ff0 "uio_logs_5000_systemd"
       new_tag = "\260\254\341\200\347\177\000\000\257\306L\000\000\000\000\000Л\341\200\347\177\000\000\210-\210\001\000\000\000\000\220*\210\001\000\000\000\000\227\003\000\000\005\000\000\000Ǔ\207\001\000\000\000\000(\000\000\0000\000\000\000\300\254\341\200\347\177\000\000\000\254\341\200\347\177\000\000f\000\000\000\000\000\000\000\033[1m[\033[0m2025/10/31 12:55:29.102444141\033[1m]\033[0m [\033[94mtrace\033[0m] [sched] 0 timer coroutines destroyed\n", '\000' <repeats 3905 times>
       last_tag = "uio_logs_5000_systemd", '\000' <repeats 4074 times>
       tag_len = 21
       last_tag_len = 21
       data = 0x7fe767966fc8
       ctx = 0x7fe774001090
       tm = {tm = {tv_sec = 1761911728, tv_nsec = 750378000}}
       kvlist = 0x7fe7740113a0
#16 0x00000000004fffcd in input_collector_fd (fd=40, ins=0x7945210) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:166
       head = 0x7fe774011378
       collector = 0x7fe7740112f0
       input_coro = 0x7fe780e1ad00
       config = 0x7929490
#17 0x0000000000500b0a in engine_handle_event (fd=40, mask=1, ins=0x7945210, config=0x7929490) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:181
       ret = 0
#18 input_thread (data=0x7fe77c01f360) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_input_thread.c:420
       __flb_event_priority_live_foreach_iter = 0
       __flb_event_priority_live_foreach_n_events = 1
       ret = 0
       thread_id = 0
       tmp = "flb-in-systemd.0-w0\000\000\000\000\000M6\026\000\000\000\000\000\300F\342\200\347\177\000\000G\311\341\201\347\177", '\000' <repeats 17 times>
       instance_exit = 0
       event = 0x7fe7740112f0
       ins = 0x7945210
       evl_bktq = 0x7fe7740062c0
       thi = 0x7fe77c01f360
       p = 0x7936560
       sched = 0x7fe774000b70
       dns_ctx = {lookups = {prev = 0x7fe780e1ad30, next = 0x7fe780e1ad30}, lookups_drop = {prev = 0x7fe780e1ad40, next = 0x7fe780e1ad40}}
       notification = 0x7fe781e87e5e <__GI___snprintf+158>
#19 0x0000000000577a77 in step_callback (data=0x7fe77c0249f0) at /root/rhel10-test/fluent-bit-4.1.1/src/flb_worker.c:43
       worker = 0x7fe77c0249f0
#20 0x00007fe781ebbb68 in start_thread (arg=<optimized out>) at pthread_create.c:448
       ret = <optimized out>
       pd = <optimized out>
       out = <optimized out>
       unwind_buf = {cancel_jmp_buf = {{jmp_buf = {32, -2218781407457520282, -37112, 0, 140632293230656, 140632293230919, 2232184335564330342, 2232182085447374182}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
       not_first_call = <optimized out>
#21 0x00007fe781f2c6bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
No locals.

rafaelma avatar Oct 31 '25 12:10 rafaelma

Hmmm, this could cased by collision of zstd symbols between system's one and bundled one. I tried to use another way to handle this: https://github.com/fluent/fluent-bit/pull/11111

cosmo0920 avatar Nov 05 '25 14:11 cosmo0920

Hello @edsiper #11073 doesn't fix the coredump described in this issue.

This bug is still active in fluent-bit 4.2.0 with the same behavior as described for version 4.1.1 in this issue. The crash still occurs when journald logs a multi-line log from selinux and fluent-bit reads it.

rafaelma avatar Nov 14 '25 13:11 rafaelma

Hello @cosmo0920 I have tried https://github.com/fluent/fluent-bit/pull/11111 with fluent-bit 4.2.0 and I get the same behavior as described for version 4.1.1 in this issue. The crash still occurs when journald logs a multi-line log from selinux and fluent-bit reads it.

rafaelma avatar Nov 19 '25 09:11 rafaelma

Using a UBI 10 image as well seems to trigger the sigsev pretty quickly even when built from source, but not distroless or UBI 9. Found this in some downstream testing after stepping up to UBI 10: https://github.com/FluentDo/agent/pull/115

patrick-stephens avatar Nov 20 '25 14:11 patrick-stephens

hello @cosmo0920, any news on this issue? Do you need any more information/debugging from us?

regards

rafaelma avatar Dec 08 '25 15:12 rafaelma

Hi, I tried to use system's zstd library to build fluent-bit packages for RHEL10 here: https://github.com/fluent/fluent-bit/pull/11111 Could you test it on RHEL 10? The built package is here: https://github.com/fluent/fluent-bit/actions/runs/20092756932?pr=11111

cosmo0920 avatar Dec 10 '25 09:12 cosmo0920

hello @cosmo0920, It is a pleasure to report that the package:

packages-pr-11111-almalinux-10 
https://github.com/fluent/fluent-bit/actions/runs/20092756932/artifacts/4821866793

works on RHEL10 without problems.

Fluent-bit is able to handle multiline logs from selinux without any problems and does not crash. Thanks a lot for the help, hope 4.2.1 will be released soon :)

rafaelma avatar Dec 10 '25 10:12 rafaelma

@rafaelma any idea what's the minimum config to trigger the issue? I'm trying to add some downstream tests to ensure we don't hit a regression in future so will try with just the new library installed but it may need a plugin that actually triggers the ABI to fail.

patrick-stephens avatar Dec 12 '25 09:12 patrick-stephens

@rafaelma any idea what's the minimum config to trigger the issue? I'm trying to add some downstream tests to ensure we don't hit a regression in future so will try with just the new library installed but it may need a plugin that actually triggers the ABI to fail.

In our case was a multiline log in journald from audit/selinux:

Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp
                                                       
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************

I suppose that any selinux violation that produce this type of log will produce a coredump.
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp

I suppose that any selinux violation will be enough to produce a crash.

rafaelma avatar Dec 12 '25 12:12 rafaelma

I'm seeing core dump with the systemd input enabled using this RPM:

$ rpm -qi fluent-bit 
Name        : fluent-bit
Version     : 4.2.1
Release     : 1
Architecture: x86_64
Install Date: Mon 15 Dec 2025 10:43:45 GMT
Group       : System Environment/Daemons
Size        : 27689798
License     : Apache v2.0
Signature   :
              RSA/SHA512, Sat 13 Dec 2025 17:34:51 GMT, Key ID 9f9ddc083888c1cd
Source RPM  : fluent-bit-4.2.1-1.src.rpm
Build Date  : Fri 12 Dec 2025 23:34:34 GMT
Build Host  : e90c446b9ce6
Relocations : / 
Vendor      : Chronosphere Inc.
Summary     : Fast data collector for Linux
Description :
Fluent Bit is a high performance and multi platform Log Forwarder.
$ 

I haven't been able to work out if this RPM contains previously discussed fix or not. The rpm has no change log and there aren't any release notes at https://fluentbit.io/announcements/

The only multiline log messages I can see in journald output are from Fluent Bit core dumps.

I'm seeing core dump with the systemd input enabled using this RPM:

$ rpm -qi fluent-bit 
Name        : fluent-bit
Version     : 4.2.1
Release     : 1
Architecture: x86_64
Install Date: Mon 15 Dec 2025 10:43:45 GMT
Group       : System Environment/Daemons
Size        : 27689798
License     : Apache v2.0
Signature   :
              RSA/SHA512, Sat 13 Dec 2025 17:34:51 GMT, Key ID 9f9ddc083888c1cd
Source RPM  : fluent-bit-4.2.1-1.src.rpm
Build Date  : Fri 12 Dec 2025 23:34:34 GMT
Build Host  : e90c446b9ce6
Relocations : / 
Vendor      : Chronosphere Inc.
Summary     : Fast data collector for Linux
Description :
Fluent Bit is a high performance and multi platform Log Forwarder.
$ 

I haven't been able to work out if this RPM contains previously discussed fix or not. The rpm has no change log and there aren't any release notes at https://fluentbit.io/announcements/

The only multiline log messages I can see in journald output are from Fluent Bit core dumps.

Unfortunately, the official Fluent Bit v4.2.1 does not include this fix.

cosmo0920 avatar Dec 16 '25 02:12 cosmo0920

Unfortunately, the official Fluent Bit v4.2.1 does not include this fix.

This was very unfortunate for us :( ...... We hope 4.2.2 will be released soon

rafaelma avatar Dec 16 '25 09:12 rafaelma

@rafaelma any idea what's the minimum config to trigger the issue? I'm trying to add some downstream tests to ensure we don't hit a regression in future so will try with just the new library installed but it may need a plugin that actually triggers the ABI to fail.

In our case was a multiline log in journald from audit/selinux:

Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp
                                                       
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443. For complete SELinux messages run: sealert -l 5334dac0-7877-4fb1-98a9-17f32e844736
Oct 28 10:11:23 hostname.domain setroubleshoot[1221221]: SELinux is preventing /usr/bin/python3.12 from name_connect access on the tcp_socket port 443.
                                                       
                                                       *****  Plugin catchall_boolean (89.3 confidence) suggests   ******************

I suppose that any selinux violation that produce this type of log will produce a coredump.
                                                       
                                                       If you want to allow nis to enabled
                                                       Then you must tell SELinux about this by enabling the 'nis_enabled' boolean.
                                                       
                                                       Do
                                                       setsebool -P nis_enabled 1
                                                       
                                                       *****  Plugin catchall (11.6 confidence) suggests   **************************
                                                       
                                                       If you believe that python3.12 should be allowed name_connect access on the port 443 tcp_socket by default.
                                                       Then you should report this as a bug.
                                                       You can generate a local policy module to allow this access.
                                                       Do
                                                       allow this access for now by executing:
                                                       # ausearch -c 'rhsm-package-pr' --raw | audit2allow -M my-rhsmpackagepr
                                                       # semodule -X 300 -i my-rhsmpackagepr.pp

I suppose that any selinux violation will be enough to produce a crash.

Thanks, I've not been able to trigger it with our Alma/Rocky Linux 10 packages but have done so with UBI 10 container so does seem to be something weird.

patrick-stephens avatar Dec 16 '25 15:12 patrick-stephens