Setting threaded to true in prometheus_scrape input causes SIGSEGV
Bug Report
Describe the bug
Setting the threaded key to true for prometheus_scrape input causes Fluent Bit to throw SIGSEGV errors.
To Reproduce
---
service:
storage.path: /var/spool/fluent-bit
pipeline:
inputs:
- name: prometheus_scrape
host: 127.0.0.1
port: 9100
tag: metrics.node
metrics_path: /metrics
scrape_interval: 10s
threaded: false
outputs:
- name: null
match: '*'
# /opt/fluent-bit/bin/fluent-bit -c test.yaml -D
Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ _____ __
| ___| | | | | ___ (_) | |____ |/ |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __ / /`| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / \ \ | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /.___/ /_| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ \____(_)___/
configuration test is successful
#
Then changing threaded from false to true:
---
service:
storage.path: /var/spool/fluent-bit
pipeline:
inputs:
- name: prometheus_scrape
host: 127.0.0.1
port: 9100
tag: metrics.node
metrics_path: /metrics
scrape_interval: 10s
threaded: true
outputs:
- name: null
match: '*'
# /opt/fluent-bit/bin/fluent-bit -c test.yaml -D
Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ _____ __
| ___| | | | | ___ (_) | |____ |/ |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __ / /`| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / \ \ | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /.___/ /_| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ \____(_)___/
configuration test is successful
[2024/10/25 07:55:51] [engine] caught signal (SIGSEGV)
#0 0x5639131a8e7e in flb_input_exit_all() at src/flb_input.c:1341
#1 0x5639131c3138 in flb_engine_shutdown() at src/flb_engine.c:1121
#2 0x56391319d264 in flb_destroy() at src/flb_lib.c:240
#3 0x56391310dc1b in flb_main() at src/fluent-bit.c:1360
#4 0x7f707cc46249 in ???() at ???:0
#5 0x7f707cc46304 in ???() at ???:0
#6 0x56391310b800 in ???() at ???:0
#7 0xffffffffffffffff in ???() at ???:0
Aborted
#
Expected behavior I expect a supported configuration key to not throw an error and cause the program to die.
Your Environment
- Version used: 3.1.9
- Configuration: See above.
- Environment name and version (e.g. Kubernetes? What version?):
- Server type and version: Virtual
- Operating System and version: Debian 12
- Filters and plugins: None.
Additional context This generates a significant amount of noise due to the fluent-bit service keeps retarting. On one random server, that fluent-bit service has retarted 184 times the last 8 hours alone ...
@anderssynstad is this running through a package or a custom build ? any more insights about the systems might help us
@edsiper Verified it on a Debian 12 (fully patched) vps on Digital Ocean:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
$ cat /etc/apt/sources.list.d/fluentbit.list
deb [arch=amd64 signed-by=/usr/share/keyrings/fluentbit.asc] https://packages.fluentbit.io/debian/bookworm bookworm main
$ dpkg -l fluent-bit
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-============-============-=================================
ii fluent-bit 3.1.9 amd64 Fast data collector for Linux
$ uname -srv
Linux 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30)
Not sure if it is related, but the yaml example on https://docs.fluentbit.io/manual/pipeline/inputs/node-exporter-metrics has the "same" result.
$ cat test.yaml
# Node Exporter Metrics + Prometheus Exporter
# -------------------------------------------
# The following example collect host metrics on Linux and expose
# them through a Prometheus HTTP end-point.
#
# After starting the service try it with:
#
# $ curl http://127.0.0.1:2021/metrics
#
service:
flush: 1
log_level: info
pipeline:
inputs:
- name: node_exporter_metrics
tag: node_metrics
scrape_interval: 2
outputs:
- name: prometheus_exporter
match: node_metrics
host: 0.0.0.0
port: 2021
$ /opt/fluent-bit/bin/fluent-bit -c test.yaml -D
Fluent Bit v3.1.9
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
______ _ _ ______ _ _ _____ __
| ___| | | | | ___ (_) | |____ |/ |
| |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __ / /`| |
| _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / \ \ | |
| | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /.___/ /_| |_
\_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ \____(_)___/
configuration test is successful
[2024/10/26 18:50:56] [engine] caught signal (SIGSEGV)
#0 0x55ddedd2ee7e in flb_input_exit_all() at src/flb_input.c:1341
#1 0x55ddedd49138 in flb_engine_shutdown() at src/flb_engine.c:1121
#2 0x55ddedd23264 in flb_destroy() at src/flb_lib.c:240
#3 0x55ddedc93c1b in flb_main() at src/fluent-bit.c:1360
#4 0x7fdd0e246249 in ???() at ???:0
#5 0x7fdd0e246304 in ???() at ???:0
#6 0x55ddedc91800 in ???() at ???:0
#7 0xffffffffffffffff in ???() at ???:0
Aborted
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This appears to maybe have been resolved? Not able to reproduce on the same server with Fluent Bit version 3.2.9.