fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

2 connections are set up while sending logs to Loki

Open duj4 opened this issue 1 year ago • 7 comments

Bug Report

Describe the bug There are 2 connections set up while sending logs to Loki:

Turn on net.keepalive:

[2024/07/19 21:37:15] [debug] [router] match rule tail.0:loki.4 [2024/07/19 21:37:15] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020 [2024/07/19 21:37:15] [ info] [sp] stream processor started [2024/07/19 21:37:15] [debug] [input:tail:tail.0] [static files] processed 1.2K [2024/07/19 21:37:15] [debug] [input:tail:tail.0] inode=206007805 file= promote to TAIL_EVENT [2024/07/19 21:37:15] [ info] [input:tail:tail.0] inotify_fs_add(): inode=206007805 watch_fd=1 name=/home/dujas/fluentbit/samples/concierge/concierge-snippet-2.log [2024/07/19 21:37:15] [debug] [input:tail:tail.0] [static files] processed 0b, done [2024/07/19 21:37:15] [debug] [task] created task=0x7fe953511ac0 id=0 OK > [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: before SSL initialization [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client hello [2024/07/19 21:37:15] [debug] [tls] connection #64 WANT_READ [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client hello [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server hello [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: TLSv1.3 read encrypted extensions [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server certificate request [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server certificate [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: TLSv1.3 read server certificate verify [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read finished [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write change cipher spec [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client certificate [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write certificate verify [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write finished [2024/07/19 21:37:15] [debug] [upstream] KA connection #64 to :443 is connected [2024/07/19 21:37:15] [debug] [http_client] not using http_proxy for header [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSL negotiation finished successfully [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSL negotiation finished successfully [2024/07/19 21:37:15] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server session ticket [2024/07/19 21:37:15] [debug] [output:loki:loki.4] :443, HTTP status=204 [2024/07/19 21:37:15] [debug] [upstream] KA connection #64 to :443 is now available [2024/07/19 21:37:15] [debug] [out flush] cb_destroy coro_id=0 [2024/07/19 21:37:15] [debug] [task] destroy task=0x7fe953511ac0 (task_id=0) [2024/07/19 21:37:16] [debug] [task] created task=0x7fe953511a20 id=0 OK [2024/07/19 21:37:16] [debug] [upstream] KA connection #64 to :443 has been assigned (recycled) [2024/07/19 21:37:16] [debug] [http_client] not using http_proxy for header [2024/07/19 21:37:16] [debug] [output:loki:loki.4] :443, HTTP status=204 [2024/07/19 21:37:16] [debug] [upstream] KA connection #64 to :443 is now available [2024/07/19 21:37:16] [debug] [out flush] cb_destroy coro_id=1 [2024/07/19 21:37:16] [debug] [task] destroy task=0x7fe953511a20 (task_id=0) [2024/07/19 21:37:47] [debug] [tls] connection #64 SSL3 alert write:warning:close notify [2024/07/19 21:37:47] [debug] [upstream] drop keepalive connection #-1 to :443 (keepalive idle timeout) ^C[2024/07/19 21:37:49] [engine] caught signal (SIGINT)

Turned off net.keepalive:

[2024/07/19 21:38:09] [debug] [router] match rule tail.0:loki.4 [2024/07/19 21:38:09] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020 [2024/07/19 21:38:09] [ info] [sp] stream processor started [2024/07/19 21:38:09] [debug] [input:tail:tail.0] [static files] processed 1.2K [2024/07/19 21:38:09] [debug] [input:tail:tail.0] inode=206007805 file= promote to TAIL_EVENT [2024/07/19 21:38:09] [ info] [input:tail:tail.0] inotify_fs_add(): inode=206007805 watch_fd=1 name= [2024/07/19 21:38:09] [debug] [input:tail:tail.0] [static files] processed 0b, done [2024/07/19 21:38:09] [debug] [task] created task=0x7f30dfd11ac0 id=0 OK > [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: before SSL initialization [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client hello [2024/07/19 21:38:09] [debug] [tls] connection #64 WANT_READ [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client hello [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server hello [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: TLSv1.3 read encrypted extensions [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server certificate request [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server certificate [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: TLSv1.3 read server certificate verify [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read finished [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write change cipher spec [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client certificate [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write certificate verify [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write finished [2024/07/19 21:38:09] [debug] [http_client] not using http_proxy for header [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSL negotiation finished successfully [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSL negotiation finished successfully [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server session ticket [2024/07/19 21:38:09] [debug] [output:loki:loki.4] :443, HTTP status=204 [2024/07/19 21:38:09] [debug] [tls] connection #64 SSL3 alert write:warning:close notify [2024/07/19 21:38:09] [debug] [out flush] cb_destroy coro_id=0 [2024/07/19 21:38:09] [debug] [task] destroy task=0x7f30dfd11ac0 (task_id=0) [2024/07/19 21:38:10] [debug] [task] created task=0x7f30dfd11a20 id=0 OK [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: before SSL initialization [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client hello [2024/07/19 21:38:10] [debug] [tls] connection #64 WANT_READ [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client hello [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server hello [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: TLSv1.3 read encrypted extensions [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server certificate request [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server certificate [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: TLSv1.3 read server certificate verify [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read finished [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write change cipher spec [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write client certificate [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write certificate verify [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS write finished [2024/07/19 21:38:10] [debug] [http_client] not using http_proxy for header [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSL negotiation finished successfully [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSL negotiation finished successfully [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL_connect: SSLv3/TLS read server session ticket [2024/07/19 21:38:10] [debug] [output:loki:loki.4] :443, HTTP status=204 [2024/07/19 21:38:10] [debug] [tls] connection #64 SSL3 alert write:warning:close notify [2024/07/19 21:38:10] [debug] [out flush] cb_destroy coro_id=1 [2024/07/19 21:38:10] [debug] [task] destroy task=0x7f30dfd11a20 (task_id=0) ^C[2024/07/19 21:38:38] [engine] caught signal (SIGINT) [2024/07/19 21:38:38] [ warn] [engine] service will shutdown in max 5 seconds [2024/07/19 21:38:38] [ info] [input] pausing tail.0 [2024/07/19 21:38:38] [ info] [engine] service has stopped (0 pending tasks) [2024/07/19 21:38:38] [ info] [input] pausing tail.0 [2024/07/19 21:38:38] [debug] [input:tail:tail.0] inode=206007805 removing file name [2024/07/19 21:38:38] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=206007805 watch_fd=1

To Reproduce

  • Set up a connection to Loki with tls enabled (mtls is not necessary)

Expected behavior There should be only one connection to Loki.

Screenshots Config:

flb

Your Environment

  • Version used: 3.1.2
  • Configuration: as shown above
  • Environment name and version (e.g. Kubernetes? What version?): Fluent-bit is initiated on binary file, Loki is on CKS
  • Server type and version: Virtual Machine
  • Operating System and version: RHEL8
  • Filters and plugins:

Additional context

duj4 avatar Jul 19 '24 14:07 duj4

hi @jtackaberry , I noticed you were using Loki as output in another case, could you please help have a check on this at your convenience? Thanks in advance.

duj4 avatar Jul 19 '24 14:07 duj4

I tried another deployment of Loki, the pure local config (https://grafana.com/docs/loki/latest/setup/install/local/), no tls, no gateway but just one node.

Here is the config of fluentbit(3.1.3):

[service]
    flush      1
    daemon     off
    # log_file   /home/dujas/fluent-bit/log/fluent-bit.log
    log_level  debug
    http_server on
    http_listen ${HOSTNAME}
    http_port  2020
    health_check on
    hot_reload on
    parsers_file parsers/parsers_multiline.conf
    parsers_file parsers/parsers.conf

[INPUT]
    name              tail
    tag               jsm
    read_from_head    true
    path              /home/dujas/fluent-bit/jira5.log
    path_key          filename
    multiline.parser  jsm
    buffer_chunk_size 32K
    buffer_max_size   32K

[FILTER]
    name         parser
    match        *
    key_name     log
    parser       jsm
    reserve_data true

[OUTPUT]
    name                   loki
    match                  *
    host                   192.168.126.128
    port                   3100
    labels                 app=jsm, hostname=${HOSTNAME}, filename=$filename
    remove_keys            filename
    drop_single_key        on
    line_format            key_value

Here is the log files:

Fluent Bit v3.1.3

  • Copyright (C) 2015-2024 The Fluent Bit Authors
  • Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
  • https://fluentbit.io

| | | | | | ___ () | |__ |/ | | |_ | |_ _ ___ _ __ | |_ | |/ /| |_ __ __ / /`| | | | | | | | |/ _ \ ' | | | ___ \ | | \ \ / / \ \ | | | | | | || | _/ | | | | | |/ / | | \ V /./ /| | _| ||_,|___|| ||_| _/||_| _/ _()__/

[2024/07/20 09:41:03] [ info] Configuration: [2024/07/20 09:41:03] [ info] flush time | 1.000000 seconds [2024/07/20 09:41:03] [ info] grace | 5 seconds [2024/07/20 09:41:03] [ info] daemon | 0 [2024/07/20 09:41:03] [ info] ___________ [2024/07/20 09:41:03] [ info] inputs: [2024/07/20 09:41:03] [ info] tail [2024/07/20 09:41:03] [ info] ___________ [2024/07/20 09:41:03] [ info] filters: [2024/07/20 09:41:03] [ info] parser.0 [2024/07/20 09:41:03] [ info] ___________ [2024/07/20 09:41:03] [ info] outputs: [2024/07/20 09:41:03] [ info] loki.0 [2024/07/20 09:41:03] [ info] ___________ [2024/07/20 09:41:03] [ info] collectors: [2024/07/20 09:41:03] [ info] [fluent bit] version=3.1.3, commit=, pid=197839 [2024/07/20 09:41:03] [debug] [engine] coroutine stack size: 24576 bytes (24.0K) [2024/07/20 09:41:03] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128 [2024/07/20 09:41:03] [ info] [cmetrics] version=0.9.1 [2024/07/20 09:41:03] [ info] [ctraces ] version=0.5.2 [2024/07/20 09:41:03] [ info] [input:tail:tail.0] initializing [2024/07/20 09:41:03] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only) [2024/07/20 09:41:03] [debug] [tail:tail.0] created event channels: read=21 write=22 [2024/07/20 09:41:03] [ info] [input:tail:tail.0] multiline core started [2024/07/20 09:41:03] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input [2024/07/20 09:41:03] [debug] [input:tail:tail.0] inotify watch fd=28 [2024/07/20 09:41:03] [debug] [input:tail:tail.0] scanning path /home/dujas/fluent-bit/jira5.log [2024/07/20 09:41:03] [debug] [input:tail:tail.0] file will be read in POSIX_FADV_DONTNEED mode /home/dujas/fluent-bit/jira5.log [2024/07/20 09:41:03] [debug] [input:tail:tail.0] inode=16804876 with offset=0 appended as /home/dujas/fluent-bit/jira5.log [2024/07/20 09:41:03] [debug] [input:tail:tail.0] scan_glob add(): /home/dujas/fluent-bit/jira5.log, inode 16804876 [2024/07/20 09:41:03] [debug] [input:tail:tail.0] 1 new files found on path '/home/dujas/fluent-bit/jira5.log' [2024/07/20 09:41:03] [debug] [loki:loki.0] created event channels: read=30 write=31 [2024/07/20 09:41:03] [debug] [output:loki:loki.0] remove_mpa size: 1 [2024/07/20 09:41:03] [ info] [output:loki:loki.0] configured, hostname=192.168.126.128:3100 [2024/07/20 09:41:03] [ info] [http_server] listen iface=centos8-1 tcp_port=2020 [2024/07/20 09:41:03] [ info] [sp] stream processor started [2024/07/20 09:41:03] [debug] [input:tail:tail.0] [static files] processed 5.8K [2024/07/20 09:41:03] [debug] [input:tail:tail.0] inode=16804876 file=/home/dujas/fluent-bit/jira5.log promote to TAIL_EVENT [2024/07/20 09:41:03] [ info] [input:tail:tail.0] inotify_fs_add(): inode=16804876 watch_fd=1 name=/home/dujas/fluent-bit/jira5.log [2024/07/20 09:41:03] [debug] [input:tail:tail.0] [static files] processed 0b, done [2024/07/20 09:41:03] [debug] [task] created task=0x7f77fa236be0 id=0 OK > [2024/07/20 09:41:03] [debug] [upstream] KA connection #53 to 192.168.126.128:3100 is connected [2024/07/20 09:41:03] [debug] [http_client] not using http_proxy for header [2024/07/20 09:41:03] [debug] [output:loki:loki.0] 192.168.126.128:3100, HTTP status=204 [2024/07/20 09:41:03] [debug] [upstream] KA connection #53 to 192.168.126.128:3100 is now available [2024/07/20 09:41:03] [debug] [out flush] cb_destroy coro_id=0 [2024/07/20 09:41:03] [debug] [task] destroy task=0x7f77fa236be0 (task_id=0) [2024/07/20 09:41:04] [debug] [task] created task=0x7f77fa236b40 id=0 OK [2024/07/20 09:41:04] [debug] [upstream] KA connection #53 to 192.168.126.128:3100 has been assigned (recycled) [2024/07/20 09:41:04] [debug] [http_client] not using http_proxy for header [2024/07/20 09:41:04] [debug] [output:loki:loki.0] 192.168.126.128:3100, HTTP status=204 [2024/07/20 09:41:04] [debug] [upstream] KA connection #53 to 192.168.126.128:3100 is now available [2024/07/20 09:41:04] [debug] [out flush] cb_destroy coro_id=1 [2024/07/20 09:41:04] [debug] [task] destroy task=0x7f77fa236b40 (task_id=0) [2024/07/20 09:41:35] [debug] [upstream] drop keepalive connection #-1 to 192.168.126.128:3100 (keepalive idle timeout)

As you can see from the bold lines, the connection was recycled, is this expected?

duj4 avatar Jul 20 '24 01:07 duj4

Same situation goes to 3.1.7, may I have any update?

duj4 avatar Sep 20 '24 01:09 duj4

we're actually seeing way more than 2 outgoing connections towards loki, going into the dozen, with a lot of recycling between them, on bare metal nodes, on fbit 3.2.2

It also seems to make use of that many as when we try to limit it to 1 worker and 5 max TCP connections per worker it starts logging

[2024/12/02 14:25:19] [error] [output:loki:loki.1] no upstream connections available
[2024/12/02 14:25:19] [ warn] [engine] failed to flush chunk '1-1733149519.186186989.flb', retry in 6 seconds: task_id=5, input=tail.0 > output=loki.1 (out_id=1)
[2024/12/02 14:25:25] [ warn] [engine] failed to flush chunk '1-1733149525.248401571.flb', retry in 8 seconds: task_id=4, input=tail.0 > output=loki.1 (out_id=1)
[2024/12/02 14:25:25] [ info] [engine] flush chunk '1-1733149519.186186989.flb' succeeded at retry 1: task_id=5, input=tail.0 > output=loki.1 (out_id=1)
[2024/12/02 14:25:33] [ info] [engine] flush chunk '1-1733149525.248401571.flb' succeeded at retry 1: task_id=4, input=tail.0 > output=loki.1 (out_id=1)

@edsiper , could you please weigh in ? can provide logs/metrics as requested.

vladst3f avatar Dec 02 '24 20:12 vladst3f

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Mar 03 '25 02:03 github-actions[bot]

Still valid.

vladst3f avatar Mar 03 '25 05:03 vladst3f

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Jun 14 '25 02:06 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jun 19 '25 02:06 github-actions[bot]