[out_es] buffer read overrun while formatting payload
Bug Report
Describe the bug
Following up from https://github.com/fluent/fluent-bit/issues/5680, while testing with a custom debug build which seems to have fixed the crash with es output, I found another issue after I enabled AppVerifier and attached WinDbg:
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ff6890b2055 (fluent_bit!__crt_simd_traits<1,unsigned char>::compare_equals)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 000001c80e35e000
Attempt to read from address 000001c80e35e000
Your Environment
- Version used: 1.9.6* (custom debug build)
- Configuration: See below
- Environment name and version: AWS EC2 instance
- Server type and version: Windows
- Operating System and version: Server 2016
- Filters and plugins: in_tail, filter_lua, output_es
fluent-bit.conf:
[SERVICE]
daemon false
flush 1
log_level debug
# log_file fluent-bit.log
parsers_file parsers.conf
http_server true
http_listen 0.0.0.0
http_port 2020
health_check true
[INPUT]
name tail
tag pharos.*
path C:\PharosSystems\Logs\Json\PharosMpsLog.txt
db C:\PharosSystems\Logs\Json\FluentBit.db
parser json
buffer_chunk_size 64kb
buffer_max_size 64kb
mem_buf_limit 16mb
skip_long_lines true
refresh_interval 10
[FILTER]
name lua
match *
script filters.lua
call transform
[OUTPUT]
name es
match *
host logs
port 443
logstash_format true
replace_dots true
retry_limit 5
trace_error true
buffer_size 64kb
aws_auth true
tls true
tls.verify false
# tls.debug 3
net.keepalive true
net.keepalive_idle_timeout 60
workers 2
filters.lua:
function transform(tag, timestamp, record)
if record["Payload"] then
record["Service"] = record["Payload"]["processName"]
end
if not record["Message"] and record["Payload"]["message"] then
record["Message"], record["Payload"]["message"] = record["Payload"]["message"], nil
end
if record["Properties"] and record["Properties"]["StatusCode"] then
record["Properties"]["StatusCode"] = tostring(record["Properties"]["StatusCode"])
end
return 1, timestamp, record
end
Additional context
Dump file captured with WinDbg:
Looking around with WinDbg Preview:
0:008> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
KEY_VALUES_STRING: 1
Key : AV.Fault
Value: Read
Key : Analysis.CPU.mSec
Value: 1530
Key : Analysis.DebugAnalysisManager
Value: Create
Key : Analysis.Elapsed.mSec
Value: 3133
Key : Analysis.Init.CPU.mSec
Value: 23295171
Key : Analysis.Init.Elapsed.mSec
Value: 928507690
Key : Analysis.Memory.CommitPeak.Mb
Value: 279
Key : Timeline.OS.Boot.DeltaSec
Value: 2309540
Key : Timeline.Process.Start.DeltaSec
Value: 86583
Key : WER.OS.Branch
Value: rs1_release
Key : WER.OS.Timestamp
Value: 2022-06-10T16:22:00Z
Key : WER.OS.Version
Value: 10.0.14393.5192
Key : WER.Process.Version
Value: 1.9.7.0
NTGLOBALFLAG: 2000000
APPLICATION_VERIFIER_FLAGS: 0
APPLICATION_VERIFIER_LOADED: 1
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ff6890b2055 (fluent_bit!__crt_simd_traits<1,unsigned char>::compare_equals)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 000001c80e35e000
Attempt to read from address 000001c80e35e000
FAULTING_THREAD: 00000fa4
PROCESS_NAME: fluent-bit.exe
READ_ADDRESS: 000001c80e35e000
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.
EXCEPTION_CODE_STR: c0000005
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 000001c80e35e000
STACK_TEXT:
000000af`82afe088 00007ff6`8909d453 : 00000000`00000000 00007ff6`8908b181 0000ae6c`dd0adb3b 00000000`00000ff7 : fluent_bit!strnlen+0x71
000000af`82afe090 00007ff6`8909752c : 000000af`82afe806 000000af`82afe1a0 00000000`fffffffc 00007ff6`8909474c : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::type_case_s+0x8b
000000af`82afe0c0 00007ff6`89092f2f : 000001c8`0c000000 000000af`82afe1a0 000000af`82afe240 000000af`82afe806 : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::state_case_type+0xb4
000000af`82afe110 00007ff6`8908b0f9 : 00007ff6`89292f98 00000000`00000fd8 00000000`00000000 000000af`82afe2b8 : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::process+0xeb
000000af`82afe140 00007ff6`88c8f1af : 000001c8`09d8cff8 000000af`82afe6d8 cccccccc`0000001e cccccccc`cccccccc : fluent_bit!common_vsprintf<__crt_stdio_output::standard_base,char>+0x111
000000af`82afe670 00007ff6`88c8e958 : 000000af`82afe806 00000000`00000fd8 00007ff6`89292f98 000000af`82aff860 : fluent_bit!vsnprintf+0x4f
000000af`82afe6c0 00007ff6`88d394a5 : 000001c8`00000004 00000000`00000000 00000000`00000000 00007ff6`89292f98 : fluent_bit!flb_log_print+0x368
000000af`82aff840 00007ff6`88cd344b : 000001c8`0f8fbfd0 000001c8`0ebc7fc0 000001c8`07822dd0 000001c8`09466df0 : fluent_bit!cb_es_flush+0x725
000000af`82aff960 00007ff6`890742be : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : fluent_bit!output_pre_cb_flush+0xbb
000000af`82affa20 00007ffc`a024e951 : 00007ff6`88cd3390 00000000`00000000 00000000`00000000 00000000`00000000 : fluent_bit!co_thunk+0xe
000000af`82affa50 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!BaseFiberStart+0x21
STACK_COMMAND: ~8s ; .cxr ; kb
FAULTING_SOURCE_LINE: minkernel\crts\ucrt\src\appcrt\string\strnlen.cpp
FAULTING_SOURCE_FILE: minkernel\crts\ucrt\src\appcrt\string\strnlen.cpp
FAULTING_SOURCE_LINE_NUMBER: 203
SYMBOL_NAME: fluent_bit!strnlen+71
MODULE_NAME: fluent_bit
IMAGE_NAME: fluent-bit.exe
FAILURE_BUCKET_ID: INVALID_POINTER_READ_AVRF_c0000005_fluent-bit.exe!strnlen
OS_VERSION: 10.0.14393.5192
BUILDLAB_STR: rs1_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
IMAGE_VERSION: 1.9.7.0
FAILURE_ID_HASH: {55f956fd-272c-9f19-3373-9182be7ab601}
Followup: MachineOwner
---------
0:008> ~8s ; .cxr ; kb
fluent_bit!__crt_simd_traits<1,unsigned char>::compare_equals [inlined in fluent_bit!strnlen+0x71]:
00007ff6`890b2055 c5f57409 vpcmpeqb ymm1,ymm1,ymmword ptr [rcx] ds:000001c8`0e35e000=??
# RetAddr : Args to Child : Call Site
00 (Inline Function) : --------`-------- --------`-------- --------`-------- --------`-------- : fluent_bit!__crt_simd_traits<1,unsigned char>::compare_equals [minkernel\crts\ucrt\inc\corecrt_internal_simd.h @ 143]
01 (Inline Function) : --------`-------- --------`-------- --------`-------- --------`-------- : fluent_bit!common_strnlen_simd+0x5f [minkernel\crts\ucrt\src\appcrt\string\strnlen.cpp @ 152]
02 (Inline Function) : --------`-------- --------`-------- --------`-------- --------`-------- : fluent_bit!common_strnlen+0x71 [minkernel\crts\ucrt\src\appcrt\string\strnlen.cpp @ 185]
03 00007ff6`8909d453 : 00000000`00000000 00007ff6`8908b181 0000ae6c`dd0adb3b 00000000`00000ff7 : fluent_bit!strnlen+0x71 [minkernel\crts\ucrt\src\appcrt\string\strnlen.cpp @ 203]
04 (Inline Function) : --------`-------- --------`-------- --------`-------- --------`-------- : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::type_case_s_compute_narrow_string_length+0x8 [minkernel\crts\ucrt\inc\corecrt_internal_stdio_output.h @ 2323]
05 00007ff6`8909752c : 000000af`82afe806 000000af`82afe1a0 00000000`fffffffc 00007ff6`8909474c : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::type_case_s+0x8b [minkernel\crts\ucrt\inc\corecrt_internal_stdio_output.h @ 2313]
06 00007ff6`89092f2f : 000001c8`0c000000 000000af`82afe1a0 000000af`82afe240 000000af`82afe806 : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::state_case_type+0xb4 [minkernel\crts\ucrt\inc\corecrt_internal_stdio_output.h @ 2054]
07 00007ff6`8908b0f9 : 00007ff6`89292f98 00000000`00000fd8 00000000`00000000 000000af`82afe2b8 : fluent_bit!__crt_stdio_output::output_processor<char,__crt_stdio_output::string_output_adapter<char>,__crt_stdio_output::standard_base<char,__crt_stdio_output::string_output_adapter<char> > >::process+0xeb [minkernel\crts\ucrt\inc\corecrt_internal_stdio_output.h @ 1699]
08 00007ff6`88c8f1af : 000001c8`09d8cff8 000000af`82afe6d8 cccccccc`0000001e cccccccc`cccccccc : fluent_bit!common_vsprintf<__crt_stdio_output::standard_base,char>+0x111 [minkernel\crts\ucrt\src\appcrt\stdio\output.cpp @ 167]
09 00007ff6`88c8e958 : 000000af`82afe806 00000000`00000fd8 00007ff6`89292f98 000000af`82aff860 : fluent_bit!vsnprintf+0x4f [C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt\stdio.h @ 1439]
0a 00007ff6`88d394a5 : 000001c8`00000004 00000000`00000000 00000000`00000000 00007ff6`89292f98 : fluent_bit!flb_log_print+0x368 [D:\Devel\gitfool\fluent-bit\src\flb_log.c @ 411]
0b 00007ff6`88cd344b : 000001c8`0f8fbfd0 000001c8`0ebc7fc0 000001c8`07822dd0 000001c8`09466df0 : fluent_bit!cb_es_flush+0x725 [D:\Devel\gitfool\fluent-bit\plugins\out_es\es.c @ 910]
0c 00007ff6`890742be : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : fluent_bit!output_pre_cb_flush+0xbb [D:\Devel\gitfool\fluent-bit\include\fluent-bit\flb_output.h @ 522]
0d 00007ffc`a024e951 : 00007ff6`88cd3390 00000000`00000000 00000000`00000000 00000000`00000000 : fluent_bit!co_thunk+0xe [D:\Devel\gitfool\fluent-bit\lib\monkey\deps\flb_libco\fiber.c @ 23]
0e 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNELBASE!BaseFiberStart+0x21
0:008> dx Debugger.Sessions[0].Processes[3988].Threads[4004].Stack.Frames[11].SwitchTo();dv /t /v
Debugger.Sessions[0].Processes[3988].Threads[4004].Stack.Frames[11].SwitchTo()
000000af`82aff960 struct flb_event_chunk * event_chunk = 0x000001c8`0f8fbfd0
000000af`82aff968 struct flb_output_flush * out_flush = 0x000001c8`0ebc7fc0
000000af`82aff970 struct flb_input_instance * ins = 0x000001c8`07822dd0
000000af`82aff978 void * out_context = 0x000001c8`09466df0
000000af`82aff980 struct flb_config * config = 0x000001c8`039347c0
000000af`82aff890 int ret = 0n1
000000af`82aff908 struct flb_elasticsearch * ctx = 0x000001c8`09466df0
000000af`82aff8b8 void * out_buf = 0x000001c8`0e35bea0
000000af`82aff920 char * signature = 0x00000000`00000000 ""
000000af`82aff8a0 char * pack = 0x000001c8`0e35bea0 "[REDACTED]"
000000af`82aff918 struct flb_http_client * c = 0x000001c8`092e2f30
000000af`82aff898 unsigned int64 pack_size = 0x1515
000000af`82aff8f8 unsigned int64 b_sent = 0x158b
000000af`82aff910 struct flb_upstream_conn * u_conn = 0x000001c8`0db8bf40
000000af`82aff8d8 unsigned int64 out_size = 0x1515
0:008> dx Debugger.Sessions[0].Processes[3988].Threads[4004].Stack.Frames[10].SwitchTo();dv /t /v
Debugger.Sessions[0].Processes[3988].Threads[4004].Stack.Frames[10].SwitchTo()
000000af`82aff840 int type = 0n4
000000af`82aff848 char * file = 0x00000000`00000000 ""
000000af`82aff850 int line = 0n0
000000af`82aff858 char * fmt = 0x00007ff6`89292f98 "[output:%s:%s] error caused by: Input.%s."
000000af`82aff808 struct flb_worker * w = 0xcccccccc`cccccccc
000000af`82aff7f8 char * args = 0x000000af`82aff860 "???"
000000af`82afe778 char * bold_color = 0x00007ff6`89305510 ""
000000af`82afe768 char * header_color = 0x00007ff6`89305516 ""
000000af`82afe770 char * header_title = 0x00007ff6`8926f724 "debug"
000000af`82afe780 char * reset_color = 0x00007ff6`89305511 ""
000000af`82afe744 int total = 0n-858993460
000000af`82afe798 struct tm result = struct tm
000000af`82afe7c8 struct tm * current = 0x000000af`82afe798
000000af`82afe740 int len = 0n30
000000af`82afe758 int64 now = 0n1659415856
000000af`82afe7e0 struct log_message msg = struct log_message
0:008> dx -r1 ((fluent_bit!flb_elasticsearch *)0x1c809466df0)
((fluent_bit!flb_elasticsearch *)0x1c809466df0) : 0x1c809466df0 [Type: flb_elasticsearch *]
[+0x000] index : 0x1c809358ff0 : "fluent-bit" [Type: char *]
[+0x008] type : 0x1c809360ff0 : "_doc" [Type: char *]
[+0x010] suppress_type_name : 0 [Type: char]
[+0x018] http_user : 0x0 [Type: char *]
[+0x020] http_passwd : 0x1c809372ff0 : "" [Type: char *]
[+0x028] cloud_user : 0x0 [Type: char *]
[+0x030] cloud_passwd : 0x0 [Type: char *]
[+0x038] has_aws_auth : 0 [Type: int]
[+0x040] aws_region : 0x0 [Type: char *]
[+0x048] aws_sts_endpoint : 0x0 [Type: char *]
[+0x050] aws_provider : 0x0 [Type: flb_aws_provider *]
[+0x058] base_aws_provider : 0x0 [Type: flb_aws_provider *]
[+0x060] aws_tls : 0x0 [Type: flb_tls *]
[+0x068] aws_sts_tls : 0x0 [Type: flb_tls *]
[+0x070] aws_session_name : 0x0 [Type: char *]
[+0x078] buffer_size : 0xfa00 [Type: unsigned __int64]
[+0x080] replace_dots : 1 [Type: int]
[+0x084] trace_output : 0 [Type: int]
[+0x088] trace_error : 1 [Type: int]
[+0x08c] logstash_format : 1 [Type: int]
[+0x090] generate_id : 0 [Type: int]
[+0x094] current_time_index : 0 [Type: int]
[+0x098] logstash_prefix : 0x1c80939eff0 : "logstash" [Type: char *]
[+0x0a0] logstash_prefix_key : 0x0 [Type: char *]
[+0x0a8] logstash_dateformat : 0x1c8093aaff0 : "%Y.%m.%d" [Type: char *]
[+0x0b0] time_key : 0x1c8093b2ff0 : "@timestamp" [Type: char *]
[+0x0b8] time_key_format : 0x1c8093bafe0 : "%Y-%m-%dT%H:%M:%S" [Type: char *]
[+0x0c0] time_key_nanos : 0 [Type: int]
[+0x0c8] write_operation : 0x1c8093eaff0 : "create" [Type: char *]
[+0x0d0] es_action : 0x1c80946cff0 : "create" [Type: char *]
[+0x0d8] id_key : 0x0 [Type: char *]
[+0x0e0] ra_id_key : 0x0 [Type: flb_record_accessor *]
[+0x0e8] include_tag_key : 0 [Type: int]
[+0x0f0] tag_key : 0x1c8093ceff0 : "flb-key" [Type: char *]
[+0x0f8] uri : "/_bulk" [Type: char [256]]
[+0x1f8] ra_prefix_key : 0x0 [Type: flb_record_accessor *]
[+0x200] u : 0x1c809468ef0 [Type: flb_upstream *]
[+0x208] ins : 0x1c807874d40 [Type: flb_output_instance *]
0:008> db 0xaf82afe7e8
000000af`82afe7e8 5b 32 30 32 32 2f 30 38-2f 30 32 20 30 34 3a 35 [2022/08/02 04:5
000000af`82afe7f8 30 3a 35 36 5d 20 5b 64-65 62 75 67 5d 20 5b 6f 0:56] [debug] [o
000000af`82afe808 75 74 70 75 74 3a 65 73-3a 65 73 2e 30 5d 20 65 utput:es:es.0] e
000000af`82afe818 72 72 6f 72 20 63 61 75-73 65 64 20 62 79 3a 20 rror caused by:
000000af`82afe828 49 6e 70 75 74 0a 00 00-00 00 00 00 00 00 00 00 Input...........
000000af`82afe838 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
000000af`82afe848 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
000000af`82afe858 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0:008> db 0x000001c8`0e35bea0 L0x1515
000001c8`0e35bea0 7b 22 63 72 65 61 74 65-22 3a 7b 22 5f 69 6e 64 {"create":{"_ind
000001c8`0e35beb0 65 78 22 3a 22 6c 6f 67-73 74 61 73 68 2d 32 30 ex":"logstash-20
000001c8`0e35bec0 32 32 2e 30 38 2e 30 32-22 2c 22 5f 74 79 70 65 22.08.02","_type
000001c8`0e35bed0 22 3a 22 5f 64 6f 63 22-7d 7d 0a 7b 22 40 74 69 ":"_doc"}}.{"@ti
000001c8`0e35bee0 6d 65 73 74 61 6d 70 22-3a 22 32 30 32 32 2d 30 mestamp":"2022-0
000001c8`0e35bef0 38 2d 30 32 54 30 34 3a-35 30 3a 35 35 2e 30 37 8-02T04:50:55.07
...
000001c8`0e35d390 30 31 30 22 2c 22 4f 70-63 6f 64 65 22 3a 22 49 010","Opcode":"I
000001c8`0e35d3a0 6e 66 6f 22 2c 22 54 61-73 6b 22 3a 22 36 35 32 nfo","Task":"652
000001c8`0e35d3b0 30 31 22 7d 0a 01"}.
0:008> db 0x000001c8`0e35bea0+0x1515-5 L000001c8`0e35e000-(0x000001c8`0e35bea0+0x1515-5)
000001c8`0e35d3b0 30 31 22 7d 0a c0 c0 c0-c0 c0 c0 c0 c0 c0 c0 c0 01"}............
000001c8`0e35d3c0 c0 c0 c0 c0 c0 c0 c0 c0-c0 c0 c0 c0 c0 c0 c0 c0 ................
...
000001c8`0e35dfe0 c0 c0 c0 c0 c0 c0 c0 c0-c0 c0 c0 c0 c0 c0 c0 c0 ................
000001c8`0e35dff0 c0 c0 d0 d0 d0 d0 d0 d0-d0 d0 d0 d0 d0 d0 d0 d0 ................
0:008> db
000001c8`0e35e000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e020 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e030 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e040 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e050 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e060 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
000001c8`0e35e070 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
According to the above WinDbg analysis, there was a buffer read overrun while formatting the es payload after an error:
https://github.com/fluent/fluent-bit/blob/98281688ec122f4598c64b8aec5aa2688dfbf945/plugins/out_es/es.c#L891-L892
This happens because the msgpack output from elasticsearch_format is not null terminated:
https://github.com/fluent/fluent-bit/blob/98281688ec122f4598c64b8aec5aa2688dfbf945/plugins/out_es/es.c#L591-L593
To fix this it should tweak the printf format specifier and pass the pack size as follows:
flb_plg_debug(ctx->ins, "error caused by: Input\n%.*s\n",
(int)pack_size, pack);
Made the change here: https://github.com/fluent/fluent-bit/pull/5761/files
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
Fixed by #5761.