Do not truncate elasticsearch errors when Buffer_Size is set to False
Bug Report
Describe the bug
It seems that even if Buffer_Size is set to False or to a very big number like 10m, responses from the Elasticsearch HTTP service are still truncated.
To Reproduce
- Example log message if applicable:
[2022/04/15 09:14:17] [error] [output:es:es.1] error: Output
{"took":34,"errors":true,"items":[{"create":{"_index":"test-2022.04.14.22-000086","_id":"uxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650842,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":98935675,"_primary_term":3,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317246,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vhWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":98935676,"_primary_term":3,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"vxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650843,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"wBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":98935678,"_primary_term":3,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"wRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055529,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"whWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055530,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"wxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055531,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":99960371,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317247,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xhWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650844,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"xxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":99960372,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317248,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":88055532,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yhWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":104317249,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"yxWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":97650845,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"zBWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":99960373,"_primary_term":4,"status":201}},{"create":{"_index":"test-2022.04.14.22-000086","_id":"zRWALIABaLfE49Atc-4a","_version":1,"result":"created","_shards":{"total":2,"successful":2,"fa
[2022/04/15 09:14:17] [ warn] [engine] failed to flush chunk '1-1649978824.248946957.flb', retry in 20 seconds: task_id=13, input=storage_backlog.1 > output=es.1 (out_id=1)
- Steps to reproduce the problem:
This is not very easy to reproduce but it seems that every time elasticsearch returns a long error, like in the case of a
400fluent-bit truncates it.
Expected behavior Fluent-bit should not truncate the HTTP error.
Your Environment Version used: 1.9.1 Configuration: Environment name and version (e.g. Kubernetes? What version?): k8s v1.21.8 Server type and version: AWS EC2 Elasticsearch version: 8.1.2 Operating System and version: Flatcar Container Linux by Kinvolk 2905.2.6 (Oklo) Filters and plugins:
[SERVICE]
Flush 5
Grace 30
Daemon Off
Log_Level info
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
storage.path /var/log/fluent-bit-test//storage
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 20M
storage.metrics on
[INPUT]
Name tail
Path /var/log/containers/*_test_*.log
Parser docker
Tag kube.*
Path_Key log.file
mem_buf_limit 10MB
storage.type filesystem
Rotate_Wait 60
DB /var/log/fluent-bit-test/log.db
DB.sync normal
DB.locking true
DB.journal_mode WAL
Read_from_Head true
[FILTER]
Buffer_Size 128k
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
Labels On
Annotations On
Kube_Tag_Prefix kube.var.logs.container.
[FILTER]
Name modify
Match kube.*
Rename log msg
[FILTER]
Name modify
Match kube.*
Remove date
Remove @timestamp
[FILTER]
Name modify
Match kube.*
Set cluster development
[FILTER]
Name lua
Match kube.*
script /fluent-bit/scripts/fixKubernetesInfo.lua
call transform
[FILTER]
Name lua
Match kube.*
script /fluent-bit/scripts/fixErrorObject.lua
call transform
[OUTPUT]
Name s3
Match *
bucket mybucket
region eu-west-1
store_dir /var/log/fluent-bit-test//s3
canned_acl bucket-owner-full-control
s3_key_format /year=%Y/month=%m/day=%d/hour=%H/namespace=test/$TAG[4]_$UUID.json.gz
s3_key_format_tag_delimiters ._
compression gzip
use_put_object True
total_file_size 50M
upload_timeout 5m
auto_retry_requests True
Retry_Limit False
workers 1
[OUTPUT]
Name es
Match *
Host redacted.com
Port 1234
Buffer_Size False
HTTP_User myuser
HTTP_Passwd verysecret
Index app-logs
Include_Tag_Key On
Tag_Key log.tags
Time_Key log.read_at_time
tls On
tls.verify On
Retry_Limit False
Trace_Error On
facing same issue @edsiper any updates on this?
I found the issue, here is that code that prints the response: https://github.com/fluent/fluent-bit/blob/master/plugins/out_es/es.c#L893
The problem is that these flb_log functions always truncate their output at roughly 4KB or something.
If you see here the code for trace_output, it is different, it uses good old fashioned C standard library functions which should not be truncating the data: https://github.com/fluent/fluent-bit/blob/master/plugins/out_es/es.c#L602
This is very easy to fix... I will put up a PR for it...
https://github.com/fluent/fluent-bit/pull/5761
@PettitWesley I just found a buffer read overrun in this area: https://github.com/fluent/fluent-bit/issues/5825; could you please fix that too in your pull request? 🙏
To fix this it should tweak the printf format specifier and pass the pack size as follows:
flb_plg_debug(ctx->ins, "error caused by: Input\n%.*s\n", (int)pack_size, pack);
@gitfool Sure