fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Setting threaded to true in prometheus_scrape input causes SIGSEGV

Open anderssynstad opened this issue 1 year ago • 3 comments

Bug Report

Describe the bug Setting the threaded key to true for prometheus_scrape input causes Fluent Bit to throw SIGSEGV errors.

To Reproduce

---
service:
  storage.path: /var/spool/fluent-bit
pipeline:
  inputs:
    - name: prometheus_scrape
      host: 127.0.0.1
      port: 9100
      tag: metrics.node
      metrics_path: /metrics
      scrape_interval: 10s
      threaded: false
  outputs:
    - name: null
      match: '*'
# /opt/fluent-bit/bin/fluent-bit -c test.yaml -D
Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

configuration test is successful
#

Then changing threaded from false to true:

---
service:
  storage.path: /var/spool/fluent-bit
pipeline:
  inputs:
    - name: prometheus_scrape
      host: 127.0.0.1
      port: 9100
      tag: metrics.node
      metrics_path: /metrics
      scrape_interval: 10s
      threaded: true
  outputs:
    - name: null
      match: '*'
# /opt/fluent-bit/bin/fluent-bit -c test.yaml -D
Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

configuration test is successful
[2024/10/25 07:55:51] [engine] caught signal (SIGSEGV)
#0  0x5639131a8e7e      in  flb_input_exit_all() at src/flb_input.c:1341
#1  0x5639131c3138      in  flb_engine_shutdown() at src/flb_engine.c:1121
#2  0x56391319d264      in  flb_destroy() at src/flb_lib.c:240
#3  0x56391310dc1b      in  flb_main() at src/fluent-bit.c:1360
#4  0x7f707cc46249      in  ???() at ???:0
#5  0x7f707cc46304      in  ???() at ???:0
#6  0x56391310b800      in  ???() at ???:0
#7  0xffffffffffffffff  in  ???() at ???:0
Aborted
#

Expected behavior I expect a supported configuration key to not throw an error and cause the program to die.

Your Environment

  • Version used: 3.1.9
  • Configuration: See above.
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version: Virtual
  • Operating System and version: Debian 12
  • Filters and plugins: None.

Additional context This generates a significant amount of noise due to the fluent-bit service keeps retarting. On one random server, that fluent-bit service has retarted 184 times the last 8 hours alone ...

anderssynstad avatar Oct 25 '24 06:10 anderssynstad

@anderssynstad is this running through a package or a custom build ? any more insights about the systems might help us

edsiper avatar Oct 25 '24 21:10 edsiper

@edsiper Verified it on a Debian 12 (fully patched) vps on Digital Ocean:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

$ cat /etc/apt/sources.list.d/fluentbit.list
deb [arch=amd64 signed-by=/usr/share/keyrings/fluentbit.asc] https://packages.fluentbit.io/debian/bookworm bookworm main

$ dpkg -l fluent-bit
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  fluent-bit     3.1.9        amd64        Fast data collector for Linux

$ uname -srv
Linux 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30)

anderssynstad avatar Oct 26 '24 08:10 anderssynstad

Not sure if it is related, but the yaml example on https://docs.fluentbit.io/manual/pipeline/inputs/node-exporter-metrics has the "same" result.

$ cat test.yaml
# Node Exporter Metrics + Prometheus Exporter
# -------------------------------------------
# The following example collect host metrics on Linux and expose
# them through a Prometheus HTTP end-point.
#
# After starting the service try it with:
#
# $ curl http://127.0.0.1:2021/metrics
#
service:
    flush: 1
    log_level: info
pipeline:
    inputs:
        - name: node_exporter_metrics
          tag:  node_metrics
          scrape_interval: 2
    outputs:
        - name: prometheus_exporter
          match: node_metrics
          host: 0.0.0.0
          port: 2021
$ /opt/fluent-bit/bin/fluent-bit -c test.yaml -D
Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

configuration test is successful
[2024/10/26 18:50:56] [engine] caught signal (SIGSEGV)
#0  0x55ddedd2ee7e      in  flb_input_exit_all() at src/flb_input.c:1341
#1  0x55ddedd49138      in  flb_engine_shutdown() at src/flb_engine.c:1121
#2  0x55ddedd23264      in  flb_destroy() at src/flb_lib.c:240
#3  0x55ddedc93c1b      in  flb_main() at src/fluent-bit.c:1360
#4  0x7fdd0e246249      in  ???() at ???:0
#5  0x7fdd0e246304      in  ???() at ???:0
#6  0x55ddedc91800      in  ???() at ???:0
#7  0xffffffffffffffff  in  ???() at ???:0
Aborted

anderssynstad avatar Oct 26 '24 18:10 anderssynstad

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Mar 23 '25 02:03 github-actions[bot]

This appears to maybe have been resolved? Not able to reproduce on the same server with Fluent Bit version 3.2.9.

anderssynstad avatar Mar 23 '25 09:03 anderssynstad