fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Do not truncate elasticsearch errors when Buffer_Size is set to False

Open AlessioCasco opened this issue 3 years ago • 5 comments

Bug Report

Describe the bug It seems that even if Buffer_Size is set to False or to a very big number like 10m, responses from the Elasticsearch HTTP service are still truncated.

To Reproduce

  • Example log message if applicable:
[2022/04/15 09:14:17] [error] [output:es:es.1] error: Output
{"took":34,"errors":true,"items":[{"create":{"_index":"test-2022.04.14.22-000086","_id":"uxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650842,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":98935675,"_primary_term":3,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317246,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vhWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":98935676,"_primary_term":3,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650843,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"wBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":98935678,"_primary_term":3,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"wRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055529,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"whWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055530,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"wxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055531,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":99960371,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317247,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xhWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650844,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":99960372,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317248,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055532,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yhWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317249,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650845,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"zBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":99960373,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"zRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"fa
[2022/04/15 09:14:17] [ warn] [engine] failed to flush chunk '1-1649978824.248946957.flb', retry in 20 seconds: task_id=13, input=storage_backlog.1 > output=es.1 (out_id=1)

  • Steps to reproduce the problem: This is not very easy to reproduce but it seems that every time elasticsearch returns a long error, like in the case of a 400 fluent-bit truncates it.

Expected behavior Fluent-bit should not truncate the HTTP error.

Your Environment Version used: 1.9.1 Configuration: Environment name and version (e.g. Kubernetes? What version?): k8s v1.21.8 Server type and version: AWS EC2 Elasticsearch version: 8.1.2 Operating System and version: Flatcar Container Linux by Kinvolk 2905.2.6 (Oklo) Filters and plugins:

    [SERVICE]
        Flush 5
        Grace 30
        Daemon Off
        Log_Level info
        Parsers_File parsers.conf
        Parsers_File custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port 2020
        storage.path /var/log/fluent-bit-test//storage
        storage.sync normal
        storage.checksum off
        storage.backlog.mem_limit 20M
        storage.metrics on

    [INPUT]
        Name            tail
        Path            /var/log/containers/*_test_*.log
        Parser          docker
        Tag             kube.*
        Path_Key        log.file
        mem_buf_limit   10MB
        storage.type    filesystem
        Rotate_Wait     60
        DB              /var/log/fluent-bit-test/log.db
        DB.sync         normal
        DB.locking      true
        DB.journal_mode WAL
        Read_from_Head  true

    [FILTER]
        Buffer_Size 128k
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On
        Labels On
        Annotations On
        Kube_Tag_Prefix kube.var.logs.container.

    [FILTER]
        Name modify
        Match kube.*
        Rename log msg

    [FILTER]
        Name modify
        Match kube.*
        Remove date
        Remove @timestamp

    [FILTER]
        Name modify
        Match kube.*
        Set cluster development

    [FILTER]
        Name lua
        Match kube.*
        script /fluent-bit/scripts/fixKubernetesInfo.lua
        call transform

    [FILTER]
        Name lua
        Match kube.*
        script /fluent-bit/scripts/fixErrorObject.lua
        call transform

    [OUTPUT]
        Name s3
        Match *
        bucket mybucket
        region eu-west-1
        store_dir /var/log/fluent-bit-test//s3
        canned_acl bucket-owner-full-control
        s3_key_format /year=%Y/month=%m/day=%d/hour=%H/namespace=test/$TAG[4]_$UUID.json.gz
        s3_key_format_tag_delimiters ._
        compression gzip
        use_put_object True
        total_file_size 50M
        upload_timeout 5m
        auto_retry_requests True
        Retry_Limit False
        workers 1

    [OUTPUT]
        Name es
        Match *
        Host redacted.com
        Port 1234
        Buffer_Size False
        HTTP_User myuser
        HTTP_Passwd verysecret
        Index app-logs
        Include_Tag_Key On
        Tag_Key log.tags
        Time_Key log.read_at_time
        tls On
        tls.verify On
        Retry_Limit False
        Trace_Error On

AlessioCasco avatar Apr 15 '22 09:04 AlessioCasco

facing same issue @edsiper any updates on this?

gargshubham49 avatar Jul 05 '22 09:07 gargshubham49

I found the issue, here is that code that prints the response: https://github.com/fluent/fluent-bit/blob/master/plugins/out_es/es.c#L893

The problem is that these flb_log functions always truncate their output at roughly 4KB or something.

If you see here the code for trace_output, it is different, it uses good old fashioned C standard library functions which should not be truncating the data: https://github.com/fluent/fluent-bit/blob/master/plugins/out_es/es.c#L602

This is very easy to fix... I will put up a PR for it...

PettitWesley avatar Jul 21 '22 01:07 PettitWesley

https://github.com/fluent/fluent-bit/pull/5761

PettitWesley avatar Jul 21 '22 01:07 PettitWesley

@PettitWesley I just found a buffer read overrun in this area: https://github.com/fluent/fluent-bit/issues/5825; could you please fix that too in your pull request? 🙏

To fix this it should tweak the printf format specifier and pass the pack size as follows:

flb_plg_debug(ctx->ins, "error caused by: Input\n%.*s\n",
              (int)pack_size, pack);

gitfool avatar Aug 03 '22 00:08 gitfool

@gitfool Sure

PettitWesley avatar Aug 03 '22 19:08 PettitWesley