netdata icon indicating copy to clipboard operation
netdata copied to clipboard

[Bug]: SIGSEGV logs management

Open thiagoftsm opened this issue 1 year ago • 1 comments

Bug description

On Arch Linux I have observed the following coredump:

#0  0x00007f25265158c7 in __GI___regexec (preg=preg@entry=0x55b5e04a5920 <req_client_regex>, string=string@entry=0x7ffcf4ea02da "::1", nmatch=nmatch@entry=0, pmatch=pmatch@entry=0x0, eflags=eflags@entry=0)
    at /usr/src/debug/glibc/glibc/posix/regexec.c:214
Downloading source file /usr/src/debug/glibc/glibc/posix/regexec.c
214       lock_lock (dfa->lock);                                                                                                                                                                                                            
[Current thread is 1 (Thread 0x7f2526f89640 (LWP 484))]
(gdb) bt
#0  0x00007f25265158c7 in __GI___regexec (preg=preg@entry=0x55b5e04a5920 <req_client_regex>, string=string@entry=0x7ffcf4ea02da "::1", nmatch=nmatch@entry=0, pmatch=pmatch@entry=0x0, eflags=eflags@entry=0)
    at /usr/src/debug/glibc/glibc/posix/regexec.c:214
#1  0x00007f252657515d in __compat_regexec (preg=preg@entry=0x55b5e04a5920 <req_client_regex>, string=string@entry=0x7ffcf4ea02da "::1", nmatch=nmatch@entry=0, pmatch=pmatch@entry=0x0, eflags=eflags@entry=0)
    at /usr/src/debug/glibc/glibc/posix/regexec.c:240
#2  0x000055b5e02b9585 in parse_web_log_line (wblp_config=wblp_config@entry=0x55b5e1343d30, line=line@entry=0x55b5e1343ce0 "::1 - - [09/Nov/2023:19:54:09 +0000] \"GET / HTTP/1.0\" 200 481\n", line_len=61, 
    log_line_parsed=log_line_parsed@entry=0x7ffcf4ea01d0) at /home/thiago/Netdata/netdata/logsmanagement/parser.c:633
#3  0x000055b5e02bb956 in auto_detect_web_log_parser_config (line=line@entry=0x55b5e1343ce0 "::1 - - [09/Nov/2023:19:54:09 +0000] \"GET / HTTP/1.0\" 200 481\n", delimiter=delimiter@entry=32 ' ')
    at /home/thiago/Netdata/netdata/logsmanagement/parser.c:1490
#4  0x000055b5e02b5f2d in config_section_init (main_loop=main_loop@entry=0x55b5e1304470, config_section=config_section@entry=0x55b5e1323430, forward_in_config=forward_in_config@entry=0x0, 
    p_flb_srvc_config=p_flb_srvc_config@entry=0x7ffcf4ea2830, stdout_mut=stdout_mut@entry=0x55b5e04a5880 <stdout_mut>) at /home/thiago/Netdata/netdata/logsmanagement/logsmanag_config.c:886
#5  0x000055b5e02b816c in config_file_load (main_loop=0x55b5e1304470, p_forward_in_config=0x0, p_flb_srvc_config=p_flb_srvc_config@entry=0x7ffcf4ea2830, stdout_mut=stdout_mut@entry=0x55b5e04a5880 <stdout_mut>)
    at /home/thiago/Netdata/netdata/logsmanagement/logsmanag_config.c:1406
#6  0x000055b5e02a352c in main (argc=<optimized out>, argv=<optimized out>) at /home/thiago/Netdata/netdata/logsmanagement/logsmanagement.c:168

When our log management is enabled.

Expected behavior

Plugin should not crash and run normally.

Steps to reproduce

  1. Compile on Arch.
  2. Enable plugin inside netdata.conf
  3. Set the configuration file /etc/netdata/logsmanagement.d.conf:
[global]
    enabled = yes
    update every = 1
    update timeout = 10
    use log timestamp = auto
    circular buffer max size MiB = 64
    circular buffer drop logs if full = no
    compression acceleration = 1
    collected logs total chart enable = no
    collected logs rate chart enable = yes

[db]
    db mode = full
    db dir = /var/cache/netdata/logs_management_db
    circular buffer flush to db = 6
    disk space limit MiB = 500

[forward input]
    enabled = no
    unix path = 
    unix perm = 0644
    listen = 0.0.0.0
    port = 24224

[fluent bit]
    flush = 0.1
    http listen = 0.0.0.0
    http port = 2020
    http server = false
    log file = /var/log/netdata/fluentbit.log
    log level = info
    coro stack size = 24576

...

Installation method

from source

System info

Linux archlinux 6.6.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 14 Dec 2023 03:45:42 +0000 x86_64 GNU/Linux
/etc/os-release:NAME="Arch Linux"
/etc/os-release:PRETTY_NAME="Arch Linux"
/etc/os-release:ID=arch
/etc/os-release:BUILD_ID=rolling
/etc/os-release:ANSI_COLOR="38;2;23;147;209"
/etc/os-release:LOGO=archlinux-logo

Netdata build info

Packaging:
    Netdata Version ____________________________________________ : v1.44.0-77-nightly
    Installation Type __________________________________________ : custom
    Package Architecture _______________________________________ : unknown
    Package Distro _____________________________________________ : unknown
    Configure Options __________________________________________ : dummy-configure-command
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /usr/share/netdata/web
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.6.7-arch1-1
    Operating System ___________________________________________ : Arch Linux
    Operating System ID ________________________________________ : arch
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : unknown
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 2
    CPU Frequency ______________________________________________ : 2903000000
    RAM Bytes __________________________________________________ : 1011892224
    Disk Capacity ______________________________________________ : 21474836480
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : kvm
    Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
    Container __________________________________________________ : none
    Container Detection ________________________________________ : systemd-detect-virt
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (brotli zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : NO
Database Engines:
    dbengine ___________________________________________________ : YES
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    map ________________________________________________________ : YES
    save _______________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    protobuf (platform-neutral data serialization protocol) ____ : YES (bundled)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : YES
    libcrypto (cryptographic functions) ________________________ : YES
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : NO
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : NO
    nfacct (gather netfilter accounting) _______________________ : NO
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : NO
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : NO
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : YES

Additional info

I could not see this issue on all Linux distributiions.

thiagoftsm avatar Dec 21 '23 15:12 thiagoftsm

I am also experiencing a SIGEGV caused by the logs management plugin

sudo coredumpctl -1 debug
           PID: 238230 (logs-management)
           UID: 998 (netdata)
           GID: 999 (netdata)
        Signal: 11 (SEGV)
     Timestamp: Tue 2024-01-02 19:09:54 UTC (1h 58min ago)
  Command Line: /usr/libexec/netdata/plugins.d/logs-management.plugin 1
    Executable: /usr/libexec/netdata/plugins.d/logs-management.plugin
 Control Group: /system.slice/netdata.service
          Unit: netdata.service
         Slice: system.slice
       Boot ID: fa34680522df49e1a31789f385aed95d
    Machine ID: 22a3a6ef70f741b0b60db5afec90e682
      Hostname: netdata
       Storage: /var/lib/systemd/coredump/core.logs-management.998.fa34680522df49e1a31789f385aed95d.238230.1704222594000000.zst (present)
     Disk Size: 224.3K
       Message: Process 238230 (logs-management) of user 998 dumped core.
                
                Found module /usr/libexec/netdata/plugins.d/logs-management.plugin with build-id: 6dfc00592da08e8fd6831aeee2328618c8b6f6b4
                Found module linux-vdso.so.1 with build-id: ea99e5d980dd1a4d23af20aa35a7d823b5c92f97
                Found module libssl.so.3 with build-id: ce838f6c51f037b73ade040b4abd647d7ae7d62d
                Found module libgpg-error.so.0 with build-id: 3fbec71c67bee60d8aef00697ee187079b0fb307
                Found module ld-linux-x86-64.so.2 with build-id: cccdd41e22e25f77a8cda3d045c57ffdb01a9793
                Found module libgcrypt.so.20 with build-id: 60a5e524de0ed8323edf33e9eb9127a9eee02359
                Found module libcap.so.2 with build-id: b4bf900abf14aabe12d90988ceb30888acb2bcb0
                Found module libzstd.so.1 with build-id: 5d9d0d946a3154a748e87e17af9d14764519237b
                Found module liblzma.so.5 with build-id: b85da6c48eb60a646615392559483b93617ef265
                Found module libc.so.6 with build-id: 203de0ae33b53fee1578b117cb4123e85d0534f0
                Found module libgcc_s.so.1 with build-id: e3a44e0da9c6e835d293ed8fd2882b4c4a87130c
                Found module libm.so.6 with build-id: 9f3c01b284b7fd2427aa8ae047f2720e12a4d396
                Found module libcrypto.so.3 with build-id: 156e054fb88f59a4100ca7edc74a79e3908027a8
                Found module libuv.so.1 with build-id: ff2c8af1d41a623ee738cb5839fb10384ad1c65f
                Found module libuuid.so.1 with build-id: 64c0d0cb22fa2bdeca075a0c0418ba5ff314b220
                Found module liblz4.so.1 with build-id: a85971851cd059f1af80d553c8e7170d42ec59a1
                Found module libsystemd.so.0 with build-id: e45f7492c0f62251620378d7224ad0371a8d1f98
                Stack trace of thread 242729:
                #0  0x00007f604f43c49e uv_timer_stop (libuv.so.1 + 0xa49e)
                #1  0x00005567dbbf5800 n/a (/usr/libexec/netdata/plugins.d/logs-management.plugin + 0x20800)

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/netdata/plugins.d/logs-management.plugin...
[New LWP 242729]
[New LWP 238230]
[New LWP 238298]
[New LWP 238299]
[New LWP 242728]
[New LWP 238302]
[New LWP 238303]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/libexec/netdata/plugins.d/logs-management.plugin 1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f604f43c49e in uv_timer_stop () from /lib/x86_64-linux-gnu/libuv.so.1
[Current thread is 1 (Thread 0x7f604e31d640 (LWP 242729))]
(gdb) bt full
#0  0x00007f604f43c49e in uv_timer_stop () from /lib/x86_64-linux-gnu/libuv.so.1
No symbol table info available.
#1  0x00005567dbbf5800 in p_file_info_destroy (arg=0x5567dd8148d0) at /home/max/netdata/logsmanagement/logsmanag_config.c:106
        p_file_info = <optimized out>
        __FUNCTION__ = "p_file_info_destroy"
        chartname = "logs_manag_systemd_logs\000S4\326N`\177\000\000\340\b\000H`\177\000\000\200|\355N`\177\000\000\340\b\000H`\177\000\000\030\361\377\377\377\377\377\377P\361\377\377\377\377\377\377\343\065\326N`\177\000\000@\326\061N`\177\000\000`\v\000H`\177\000\000\340\261\002"
        output_next = <optimized out>
#2  0x00007f604ed52ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140731648024880, 8131616884517824162, 140051605476928, 9, 140051616180176, 140731648025232, -8207297833926072670, -8207295940654864734}, mask_was_saved = 0}}, priv = {pad = {0x0, 
              0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#3  0x00007f604ede4660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.
(gdb) where
#0  0x00007f604f43c49e in uv_timer_stop () from /lib/x86_64-linux-gnu/libuv.so.1
#1  0x00005567dbbf5800 in p_file_info_destroy (arg=0x5567dd8148d0) at /home/max/netdata/logsmanagement/logsmanag_config.c:106
#2  0x00007f604ed52ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3  0x00007f604ede4660 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

CreeperFace00 avatar Jan 02 '24 21:01 CreeperFace00

Closing this issue because we don't plan to fix it - as of today logsmanagement.plugin is unmaintained and not supported.

ilyam8 avatar Jun 21 '24 18:06 ilyam8