fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

log missing rate is about 60% when setting buffer chunk size to 32k

Open hsingli20 opened this issue 1 year ago • 9 comments

Bug Report

Describe the bug fluentbit is used to upload logs into external servers. when using fluentbit 2.0.8 The log missing rate is about 60% with below settings buffer_chunk_size 32k buffer_max_size 32k

Solution #1: The log missing rate is about 0% with below settings. buffer_chunk_size 1M buffer_max_size 1M

Solution #2: using fluentbit 1.9.5/1.9.6, the log missing rate is about 0% without changing parameters. buffer_chunk_size 32k buffer_max_size 32k

https://docs.fluentbit.io/manual/pipeline/inputs/tail. The document is not clear to set the parameters, like Buffer_Chunk_Size and Buffer_Max_Size. Is it correct to apply such fix(Solution #1 changing buffer_chunk_size and buffer_max_size)? If yes, could you explain more technical details based on this?

To Reproduce

  • Example configuration of fluentbit when the issue happened:
[INPUT]
    name                tail
    tag                 event.kafka.ingress
    alias               kafka.ingress
    **buffer_chunk_size   32k
    buffer_max_size     32k**
    read_from_head      true
    refresh_interval    5
    rotate_wait         10
    skip_empty_lines    off
    skip_long_lines     true
    key                 message
    db                  /var/log/logshipper/kafka.ingress.db
    db.sync             normal
    db.locking          true
    db.journal_mode     off
    path                /var/log/aaa/ingress/*/*/*/*,/var/log/aaa/ingress/*/*/*/*/*,/var/log/aaa/ingress/*/*/*/*/*/*
    exclude_path        /var/log/logshipper/logshipper.log,/var/log/aaa/ingress/*.gz,/var/log/aaa/ingress/*.tgz
    mem_buf_limit       20MB
    parser              json
    ignore_older        11m

  • Steps to reproduce the problem:
  1. generate lots of logs
  2. fluentbit to upload logs to external log server
  3. check the logs missing by counting the log records

Expected behavior No logs missings or close to 0 missing rate.

Your Environment

  • Version used: 2.0.8, 1.9.7, 1.9.9, etc
  • Configuration: above
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes

hsingli20 avatar Oct 11 '23 09:10 hsingli20

Hello @hsingli20, Is this reproducible with 2.1.10? How are you measuring the loss? What's in the Fluent Bit log file? What's the load? What's the record size?

lecaros avatar Oct 11 '23 11:10 lecaros

Thanks a lot, @lecaros .

Is this reproducible with 2.1.10?

2.0.8 is tested. Not verify 2.1.10. How are you measuring the loss? The log can be measured precisely, the loss is (a-b)/a, (a=how many request sent by jmeter, from jmeter.log, b=how many request are sent to backendsimulator) What's the load? it total produce about 9000 logs per second, What's the record size? 4~5k bytes in each log What's in the Fluent Bit log file? No error logs found. {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.631+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.tel] [static files] processed 0b"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.631+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [task] destroy task=0x7fb71f6875b0 (task_id=0)"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.636+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=33780"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.637+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] [static files] processed 30.9K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.641+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=31705"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.642+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 29.8K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.643+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=79304"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.644+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=94775"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.645+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.tel] [static files] processed 169.5K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.650+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=33798"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.651+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] [static files] processed 31.0K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.656+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=32949"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.657+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 30.9K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.657+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.tel] inode=6815791 file=/var/log/xxxxxx/tel/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-telserver/2023092809/group2/n170ggg_TrafficEventLog_xxxxxx_zzzngmsender_233_20230928093356472_113098.xml promote to TAIL_EVENT"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.657+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [ info] [input:tail:http.tel] inotify_fs_add(): inode=6815791 watch_fd=23 name=/var/log/xxxxxx/tel/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-telserver/2023092809/group2/n170ggg_TrafficEventLog_xxxxxx_zzzngmsender_233_20230928093356472_113098.xml"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.657+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.tel] inode=6815794 file=/var/log/xxxxxx/tel/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-telserver/2023092809/group2/n170ggg_TrafficEventLog_xxxxxx_zzzngmsender_233_20230928093359479_113099.xml promote to TAIL_EVENT"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.657+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [ info] [input:tail:http.tel] inotify_fs_add(): inode=6815794 watch_fd=24 name=/var/log/xxxxxx/tel/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-telserver/2023092809/group2/n170ggg_TrafficEventLog_xxxxxx_zzzngmsender_233_20230928093359479_113099.xml"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.657+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.tel] [static files] processed 0b, done"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.661+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=31991"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.662+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 30.1K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.667+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=33780"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.669+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] [static files] processed 30.9K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.673+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=32408"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.675+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 30.6K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.680+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=33816"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.681+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] [static files] processed 31.0K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.686+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=32639"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.687+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 30.5K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.691+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=30861"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.694+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] [static files] processed 28.3K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.695+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input chunk] update output instances with new chunk size diff=6354"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 6.1K"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] inode=6815796 file=/var/log/xxxxxx/ingress/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-zzz-patm/2023092809/PushNotification/n170ggg_pushapplicationtrafficmgmt.log_202309280900 promote to TAIL_EVENT"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [ info] [input:tail:http.ingress] inotify_fs_add(): inode=6815796 watch_fd=8 name=/var/log/xxxxxx/ingress/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-zzz-patm/2023092809/PushNotification/n170ggg_pushapplicationtrafficmgmt.log_202309280900"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.ingress] [static files] processed 0b, done"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] inode=6815797 file=/var/log/xxxxxx/egress/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-zzz-ngmsender/2023092809/n170ggg_zzzngmsender.log_202309280900 promote to TAIL_EVENT"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [ info] [input:tail:http.egress] inotify_fs_add(): inode=6815797 watch_fd=4 name=/var/log/xxxxxx/egress/tenant_xxxxxx-T/ggg-ppsf-xxxxxx-zzz-ngmsender/2023092809/n170ggg_zzzngmsender.log_202309280900"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:02.696+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:02] [debug] [input:tail:http.egress] [static files] processed 0b, done"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.044+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815795 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.046+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=9436"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.047+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815795 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.231+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815795 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.233+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=10492"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.234+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815795 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.234+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815796 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.239+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=33807"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.243+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=19095"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.360+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.egress] inode=6815797 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.362+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=17582"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.364+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.egress] inode=6815797 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.368+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=32977"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.372+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=12879"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.574+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [task] created task=0x7fb71f687540 id=0 OK"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.574+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [output:http:http.ingress] task_id=0 assigned to thread #1"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.574+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [task] created task=0x7fb71f6875b0 id=1 OK"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.574+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [output:http:http.egress] task_id=1 assigned to thread #0"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.574+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [task] created task=0x7fb71f687620 id=2 OK"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.574+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [output:http:http.tel] task_id=2 assigned to thread #1"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.580+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [http_client] not using http_proxy for header"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.587+00:00", "severity": "warn", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.587+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [ info] [output:http:http.tel] ggg-test-helm-xxxxxx-backend-simulator.stc-n170-ggg:2222, HTTP status=200"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.587+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [out flush] cb_destroy coro_id=0"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.587+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [task] destroy task=0x7fb71f687620 (task_id=2)"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.588+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [http_client] not using http_proxy for header"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.589+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [http_client] not using http_proxy for header"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.596+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.egress] inode=6815797 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.597+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=8636"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.601+00:00", "severity": "warn", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.601+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [ info] [output:http:http.egress] ggg-test-helm-xxxxxx-backend-simulator.stc-n170-ggg:2222, HTTP status=200"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.601+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [out flush] cb_destroy coro_id=2"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.602+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.egress] inode=6815797 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.607+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=32032"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.608+00:00", "severity": "warn", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [ warn] [http_client] cannot increase buffer: current=4096 requested=36864 max=4096"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.608+00:00", "severity": "info", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [ info] [output:http:http.ingress] ggg-test-helm-xxxxxx-backend-simulator.stc-n170-ggg:2222, HTTP status=200"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.608+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [out flush] cb_destroy coro_id=2"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.608+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [task] destroy task=0x7fb71f6875b0 (task_id=1)"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.610+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=11908"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.611+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [task] destroy task=0x7fb71f687540 (task_id=0)"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.671+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815796 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.673+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=8829"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.675+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815796 events: IN_MODIFY "} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.680+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=33789"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.684+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input chunk] update output instances with new chunk size diff=19104"} {"version": "1.0.0", "timestamp": "2023-09-28T09:34:03.913+00:00", "severity": "debug", "service_id": "xxxxxx-logtransformer", "message": "[2023/09/28 09:34:03] [debug] [input:tail:http.ingress] inode=6815796 events: IN_MODIFY "}

hsingli20 avatar Oct 12 '23 08:10 hsingli20

anyone can help us explain the usage of buffer_chunk_size and buffer_max_size? why does that work after increasing the size of buffer_chunk_size and buffer_max_size when one line of logs is quite large(about 4k or 5k)?

hsingli20 avatar Oct 20 '23 02:10 hsingli20

We face to same issue. Can anyone explain what values of below properties can help us?

We can upload following:

  1. Fluent Bit's log file with trace mode enables
  2. Backup of Buffer files (*.flb) for give tail input
  3. Files from 'Output File' plugin

More details about my environment:

  1. Fluent Bit version: v1.9.10
  2. How are you measuring the loss? -->We use Loki, as log_path is unique , it is easy
  3. What's the load? --> avg load for last 24h: 40
  4. What's the record size? .. Max line count is up to 5000 and max characters in longest line in file is up to 3000
  5. Common file size is between 2 an 3.5MB

Help: https://docs.fluentbit.io/manual/pipeline/inputs/tail

Current values we use:

[INPUT]
    Name                tail
    Path                /u02/app/*_DBA_HIST_*.out
    Tag                 dbperf_stats
    Alias               tail.dbperf_stats
    Path_Key            fa.log_path
    DB                  /var/lib/storage/dbperf_stats.db
    DB.sync             normal
    DB.locking          true
    DB.journal_mode     WAL
    storage.type        filesystem
    Skip_Long_Lines     On
    Refresh_Interval    30
    Rotate_Wait         30
    Ignore_Older        15m
    Read_from_Head      False
    Inotify_Watcher     false
    Mem_Buf_Limit       8MB
    Buffer_Max_Size     2MB
    Buffer_Chunk_Size   32K

Someone advised this: .. just to increase Mem_Buf_Limit to 64MB

[INPUT]
    Name                tail
    Path                /u02/app/*_DBA_HIST_*.out
    Tag                 dbperf_stats
    Alias               tail.dbperf_stats
    Path_Key            fa.log_path
    DB                  /var/lib/storage/dbperf_stats.db
    DB.sync             normal
    DB.locking          true
    DB.journal_mode     WAL
    storage.type        filesystem
    Skip_Long_Lines     On
    Refresh_Interval    30
    Rotate_Wait         30
    Ignore_Older        15m
    Read_from_Head      False
    Inotify_Watcher     false
    Mem_Buf_Limit       64MB
    Buffer_Max_Size     2MB
    Buffer_Chunk_Size   32K

The filer of this issue recommends to set Buffer_Chunk_Size and buffer_max_size to 1MB, So make sense to set these both properties to 2MB?

[INPUT]
    Name                tail
    Path                /u02/app/*_DBA_HIST_*.out
    Tag                 dbperf_stats
    Alias               tail.dbperf_stats
    Path_Key            fa.log_path
    DB                  /var/lib/storage/dbperf_stats.db
    DB.sync             normal
    DB.locking          true
    DB.journal_mode     WAL
    storage.type        filesystem
    Skip_Long_Lines     On
    Refresh_Interval    30
    Rotate_Wait         30
    Ignore_Older        15m
    Read_from_Head      False
    Inotify_Watcher     false
    Mem_Buf_Limit       8MB
    Buffer_Max_Size     2MB
    Buffer_Chunk_Size   2MB

Would be great if anyone explain how below 3 properties of the Tail plugin work and how to debug their behavior in Fluent Bit log.

Is there any general recommendation which can avoid to data loss?

emmacz avatar Dec 14 '23 23:12 emmacz

Hello, is this reproducible on a currently supported version? (either 2.1.x or 2.2.x) In general, you shouldn't modify buffer_max_size or buffer_chunk_size. Why do you need to modify them? If you are still able to reproduce, provide steps to do it so we can take a look at it.

lecaros avatar Dec 15 '23 20:12 lecaros

Hello @lecaros ,

during the time I will prepare all stuff for you to reproduce the issue, could you please compare version we use 1.9.10 versus 2.1.x or 2.2.x, from memory/buffer/chunks point of view?

I changed buffer_chunk_size to same value per the filer of this Bug (buffer_max_size and buffer_chunk_size - both set to 1MB, see header of this Bug). So, could you please explain why it is not good idea to play with them? After the increase of buffer_chunk_size from 32KB (default) to 2MB I have seen much less data loss (1 file from 10) than before that.

Could you also explain below note from doc: https://docs.fluentbit.io/manual/administration/backpressure#storage.max_chunks_up

storage.max_chunks_up : 
   Please note that when storage.type filesystem is set, the Mem_Buf_Limit setting no longer has any effect,
 instead, the [SERVICE] level storage.max_chunks_up setting controls the size of the memory buffer.

I my case we have set 'Mem_Buf_Limit 8MB', and 'storage.type filesystem' for Tail plugin, 
and also for [SERVICE] level 'storage.max_chunks_up 32'

Does it mean that Mem_Buf_Limit make no sense to be set at all, when we have set 'storage.type filesystem' and storage.max_chunks_up ?

Thank you also to point out that our version 1.9.10 is already not supported. I will try the latest version. Ref. https://github.com/fluent/fluent-bit/security

emmacz avatar Dec 15 '23 23:12 emmacz

@lecaros could you also update https://github.com/fluent/fluent-bit/discussions/5719 ?

emmacz avatar Dec 19 '23 01:12 emmacz

Hello, were you able to test the latest version? These questions are still relevant if we want to troubleshoot this. Is this reproducible with 2.2.2? How are you measuring the loss? What's in the Fluent Bit log file? What's the load? What's the record size?

lecaros avatar Feb 13 '24 14:02 lecaros

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar May 15 '24 01:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar May 21 '24 01:05 github-actions[bot]