fluent-bit
fluent-bit copied to clipboard
Wildcard routing not working for Loki
Bug Report
Describe the bug
I use the following configuration for Linux hosts (Amazon Linux 2 and Ubuntu 22.04) attempting to send metrics/logs scrapped by Fluent-bit to Loki logging service. I"m currently running 1.9.7
.
[SERVICE]
flush 5
daemon Off
log_level debug
parsers_file parsers.conf
plugins_file plugins.conf
http_server on
http_listen 0.0.0.0
http_port 2020
storage.metrics on
[INPUT]
name cpu
tag local.cpu
interval_sec 5
[INPUT]
name mem
tag local.mem
interval_sec 5
[INPUT]
Name disk
Tag local.disk
interval_sec 5
[INPUT]
Name tail
Path /var/log/messages
Parser syslog-rfc5424
Tag local.varlogmsg
Refresh_Interval 5
[INPUT]
name fluentbit_metrics
tag internal_metrics
scrape_interval 2
[OUTPUT]
Name loki
Match *
Host logs-prod3.grafana.net
port 443
tls on
tls.verify on
http_user XXXXXX
http_passwd XXXXXX
labels job=fluentbit,service=app-test,env=test
label_keys $sub['stream']
[OUTPUT]
Name file
Match *
Path /tmp
File fluentbit_output.log
When I set it up like this it only sends cpu metrics to Loki. All other Inputs are ignored. But, I can see the all the other outputs correctly send to /tmp/fluentbit_output.log
correctly.
However, if I duplicate the loki OUTPUT block multiple times and individually map each input tag to them, then the data does show up in Loki. (Refer the config file below for example)
I expected the wildcard operation (*) of the Loki will be able to capture all inputs and send to Loki. (Ref: Routing with Wildcard) Please let me know if I am doing this wrong?
[SERVICE]
flush 5
daemon Off
log_level debug
parsers_file parsers.conf
plugins_file plugins.conf
http_server on
http_listen 0.0.0.0
http_port 2020
storage.metrics on
[INPUT]
name cpu
tag local.cpu
interval_sec 5
[INPUT]
name mem
tag local.mem
interval_sec 5
[INPUT]
Name disk
Tag local.disk
interval_sec 5
[INPUT]
Name tail
Path /var/log/messages
Parser syslog-rfc5424
Tag local.varlogmsg
Refresh_Interval 5
[INPUT]
Name tail
Path /var/log/secure
Parser syslog-rfc5424
Tag local.sshlog
Refresh_Interval 5
[INPUT]
name fluentbit_metrics
tag internal_metrics
scrape_interval 2
[OUTPUT]
Name loki
Match local.cpu
Host logs-prod3.grafana.net
port 443
tls on
tls.verify on
http_user XXXXXX
http_passwd XXXXXX
labels job=fluentbit,service=app-test,env=test
label_keys $sub['stream']
[OUTPUT]
Name loki
Match local.mem
Host logs-prod3.grafana.net
port 443
tls on
tls.verify on
http_user XXXXXX
http_passwd XXXXXX
labels job=fluentbit,service=app-test,env=test
label_keys $sub['stream']
[OUTPUT]
Name loki
Match local.varlogmsg
Host logs-prod3.grafana.net
port 443
tls on
tls.verify on
http_user XXXXXX
http_passwd XXXXXX
labels job=fluentbit,service=app-test,env=test
label_keys $sub['stream']
[OUTPUT]
Name file
Match *
Path /tmp
File fluentbit_output.log
Wildcard routing has worked fine for me exactly like that with Loki. Can you check there are no errors/warnings in the Fluent Bit logs?
I'm not sure what happens if a key is missing as well so maybe try removing that?
label_keys $sub['stream']
I was doing this for a blog post a while back so have some Grafana Cloud examples as well: https://github.com/calyptia/openshift-fluent-bit-examples
I use wildcard routing and it was fine: https://github.com/calyptia/openshift-fluent-bit-examples/blob/427e1adb0e89bd5992d2df222af4a9ecf15d6a38/grafana-cloud/values-grafana-cloud.yaml#L16-L25
@patrick-stephens Yes, I did try without label_keys $sub['stream']
in the config a moment ago as well. It behaves the same. Again, the issue is not that it doesn't work, instead the Loki output only picks up only one INPUT source. Is there any other data I can provide to help with the investigation?
Hello @patrick-stephens I repro this issue using :
- Fluent-bit 1.9.7
- Latest Loki and Grafana docker image
- GCP e2-micro Instance running Ubuntu 20.04
- Fluent-bit config:
[SERVICE]
flush 1
daemon Off
parsers_file ../conf/parsers.conf
log_level debug
[INPUT]
Name docker
Include d9f819b89974 1479cbb42d71
Tag my_tag2
Interval_Sec 10
[INPUT]
name cpu
Tag my_tag3
interval_sec 10
[INPUT]
name mem
Tag my_tag4
interval_sec 10
[INPUT]
Name disk
Tag my_tag5
interval_sec 10
[Output]
Name loki
Match *
Host 127.0.0.1
port 3100
Labels job=fluent
When running this config with interval_sec=10 or 5 there are no issues on my end, the data for all the plugins configured is shown in Grafana, but as soon as I change this setting to anything closer to 1 the output for these plugins is not sent to Loki, and after terminating fluent-bit you will see the count of pending tasks for all the plugins that didn't reach Loki, with interval_sec =1 FB only sends data for the first plugin configured in the Fluent_bit config file, in my case is the docker input plugin.
As @vishwa-trulioo mentioned if you add a Loki output targeting each input tag all the data is received in Loki and shown in grafana
This is also happening in Fluent-Bit 2.0, the data is not making it to Loki, as I mentioned if you send the data from CPU, mem, docker, and disk to the standard output you'll see the data from all these plugins but it is not reaching Loki.
`
[2022/09/27 11:09:28] [debug] [upstream] KA connection #38 to 127.0.0.1:3100 is now available [2022/09/27 11:09:28] [debug] [out flush] cb_destroy coro_id=42 [2022/09/27 11:09:28] [debug] [task] destroy task=0x7fd3d000ed80 (task_id=0) ^C[2022/09/27 11:09:28] [engine] caught signal (SIGINT) [2022/09/27 11:09:28] [ info] [input] pausing docker.0 [2022/09/27 11:09:28] [ info] [input] pausing cpu.1 [2022/09/27 11:09:28] [debug] [task] created task=0x7fd3d000ed80 id=0 OK [2022/09/27 11:09:28] [debug] [task] created task=0x7fd3d18ae0b0 id=129 OK [2022/09/27 11:09:28] [debug] [task] created task=0x7fd3d18aca70 id=130 OK
. .
[2022/09/27 11:09:32] [debug] [output:loki:loki.0] 127.0.0.1:3100, HTTP status=204 [2022/09/27 11:09:32] [debug] [upstream] KA connection #38 to 127.0.0.1:3100 is now available [2022/09/27 11:09:32] [debug] [out flush] cb_destroy coro_id=49 [2022/09/27 11:09:32] [debug] [task] destroy task=0x7fd3d0010a50 (task_id=15) [2022/09/27 11:09:33] [ info] [task] docker/docker.0 has 0 pending task(s): [2022/09/27 11:09:33] [ info] [task] cpu/cpu.1 has 38 pending task(s): [2022/09/27 11:09:33] [ info] [task] task_id=18 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=21 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=24 still running on route(s): loki/loki.0 . . . [2022/09/27 11:09:33] [ info] [task] task_id=123 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=126 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=129 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] mem/mem.2 has 44 pending task(s): [2022/09/27 11:09:33] [ info] [task] task_id=2 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=4 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=7 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=10 still running on route(s): loki/loki.0 . . . . [2022/09/27 11:09:33] [ info] [task] task_id=124 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=127 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=130 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] disk/disk.3 has 43 pending task(s): [2022/09/27 11:09:33] [ info] [task] task_id=5 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=8 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=11 still running on route(s): loki/loki.0 [2022/09/27 11:09:33] [ info] [task] task_id=14 still running on route(s): loki/loki.0 `
Full debug log attached: FB-debug.odt
Hello @vishwa-trulioo
The Loki output plugin disabled processing multiple tasks per flush, because Loki historically did not support out-of-order writes, now it does, and by removing the flag FLB_OUTPUT_NO_MULTIPLEX from the Loki output plugin in the PR by @sflanker https://github.com/fluent/fluent-bit/pull/6136 solves the problem you have described in this issue.
This was recently merged into the master branch and tested using an almost exact configuration that you provided when this issue was open.
Config file:
[SERVICE]
flush 5
daemon Off
log_level debug
parsers_file ../../conf/parsers.conf
plugins_file plugins.conf
http_server on
http_listen 0.0.0.0
http_port 2020
storage.metrics on
[INPUT]
name cpu
tag local.cpu
interval_sec 5
[INPUT]
name mem
tag local.mem
interval_sec 5
[INPUT]
Name disk
Tag local.disk
interval_sec 5
[INPUT]
Name tail
Path /var/log/syslog
Parser syslog-rfc5424
Tag local.syslog
Refresh_Interval 5
[INPUT]
name fluentbit_metrics
tag internal_metrics
scrape_interval 2
[Output]
Name loki
Match *
Host 127.0.0.1
port 3100
Labels job=fluentbit,service=app,env=test
label_keys $sub['stream']
[OUTPUT]
Name file
Match *
Path /tmp
File fluentbit_output.log
You can check these articles to test Fluent-Bit v2.0 which includes this change, but as this is not an official version yet, is not intended for production environment. https://docs.fluentbit.io/manual/installation/sources/download-source-code https://docs.fluentbit.io/manual/v/2.0-pre/installation/sources/build-and-install
Please note: the master branch will be our next release 2.0, you can also test it with an unofficial image https://github.com/fluent/fluent-bit/tree/master/dockerfiles#ghcrio-topology
@RicardoAAD Thank you very for working it out. I'm looking forward to test it on my side as soon as Fluent bit 2.0 is released. For the time being I will be setting Interval_sec=10
sec. Thanks once again and appreciate the assistance and clarifications.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.
@vishwa-trulioo is this resolved now?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale
label.