windows_eventlog2 invalid/corrupt output
Describe the bug
Possible buffer overflow? Original issue posted on the Fluentd Google Group showed there were unexpected CJK characters in event logs. Upon further investigation these are not CJK characters, but rather botched unicode bytes appended to the original text after the "end of text" character. This leads me to believe the windows_eventlog2 plugin may be reading past the desired bytes in memory and grabbing extra data.
Here is how it looks: "Description":"The resource loader failed to find MUI file. 㐳㈸‧獉畃牲湥㵴琧畲❥㸯⼼潂歯慭歲楌瑳>>䏐涔倀者䈼潯浫牡䱫獩㹴†䈼潯浫牡桃湡敮㵬洧捩潲潳瑦眭湩潤獷欭牥敮湰⽰潣普杩牵瑡潩❮删捥牯䥤㵤㈧㐱✱䤠䍳牵敲瑮✽牴敵⼧ാ㰊䈯潯浫牡䱫獩㹴㸀",
To Reproduce
Configure Fluentd to read all event logs with "read_all_channels true". This does not occur on the top-level Application, System, and Security logs. Coonfigure the match to dump all output to a local json file for convenience. In the below code I had already narrowed it down to the wer-payloadhealth log, but that may not be consistent on every system, which is why I recommended using "read_all_channels true".
<source>
@type windows_eventlog2
@id windows_eventlog2
channels "microsoft-windows-wer-payloadhealth/operational"
preserve_qualifiers_on_hash true
read_existing_events
read_interval 10
tag winevt.raw2
render_as_xml false
rate_limit 5000
<storage>
@type local
persistent true
path "C:/Program Files/appname/Fluentd/pos/winevt2.json"
</storage>
</source>
<match winevt.raw2>
@type file
path "C:/Temp/${tag}.%Y%m%d%H%M"
path_suffix ".json"
append true
<format>
@type json
</format>
<buffer tag,time>
timekey 1m
timekey_use_utc true
timekey_wait 1m
chunk_limit_size 500MB
flush_thread_count 2
</buffer>
</match>
Expected behavior
The output json file will contain numerous Description elements with what appears to be CJK text. Many if not all will be associated with the what should be an empty Description.
The corresponding Description in Windows will likely be "The Description for event ID xx .... cannot be found."
Grab a Description text from the log and run it through a converter, such as the C# below:
string originalString = "paste string here";
foreach (char c in originalString)
{
byte[] utf8Bytes = Encoding.Unicode.GetBytes(c.ToString());
Console.WriteLine($"{(int)c} - { Encoding.UTF8.GetString(utf8Bytes)}");
}
EDIT the 3 printed below is actually "13" (carriage return). Something that stands out is the "3", which is the "end of text" character. For now I can add a check to that in my code to signify what is valid text. You can see after it gets past "10", the line feed character, everything goes a bit wonky.
Here is a snippet of the output: Colums are Integer value - character
77 - M 85 - U 73 - I 32 - 102 - f 105 - i 108 - l 101 - e 46 - . 3 - 10 -
13363 - 34 12856 - 82 8231 - ' 29513 - Is 30019 - Cu 29298 - rr 28261 - en 15732 - t= 29735 - 't 30066 - ru 10085 - e' 15919 - /> 2573 -
Your Environment
- Fluentd version: 1.11.1 and 1.12.3
- TD Agent version: 3.8.1 and 4.1.1
- Operating system: Windows Server 2019 and Windows 10 Pro
- Kernel version:
Your Configuration
<source>
@type windows_eventlog2
@id windows_eventlog2
channels "microsoft-windows-wer-payloadhealth/operational"
preserve_qualifiers_on_hash true
read_existing_events
read_interval 10
tag winevt.raw2
render_as_xml false
rate_limit 5000
<storage>
@type local
persistent true
path "C:/Program Files/appname/Fluentd/pos/winevt2.json"
</storage>
</source>
<match winevt.raw2>
@type file
path "C:/Temp/${tag}.%Y%m%d%H%M"
path_suffix ".json"
append true
<format>
@type json
</format>
<buffer tag,time>
timekey 1m
timekey_use_utc true
timekey_wait 1m
chunk_limit_size 500MB
flush_thread_count 2
</buffer>
</match>
Your Error Log
No errors.
Additional context
No response
It's winevt_c or fluent-plugin-windows-evnetlog's issue. I've transfered this issue to fluent-plugin-windows-eventlog (may be forwarded to winevt_c later).
I guess that it may be solved by appropriate https://github.com/fluent/fluent-plugin-windows-eventlog#parameters from_encoding.
I guess that it may be solved by appropriate https://github.com/fluent/fluent-plugin-windows-eventlog#parameters
from_encoding.
I'm not sure but I don't think so.
I think the tailing line is cut off by ETX (0x03) at converting to UTF-8 or converting to Ruby string.
https://github.com/fluent-plugins-nursery/winevt_c/blob/19ad48ac19d2bf1bf3a8d7cf781fc1872562233c/ext/winevt/winevt_utils.cpp#L8-L20
VALUE
wstr_to_rb_str(UINT cp, const WCHAR* wstr, int clen)
{
VALUE vstr;
CHAR* ptr;
int len = WideCharToMultiByte(cp, 0, wstr, clen, nullptr, 0, nullptr, nullptr);
ptr = ALLOCV_N(CHAR, vstr, len);
WideCharToMultiByte(cp, 0, wstr, clen, ptr, len, nullptr, nullptr);
VALUE str = rb_utf8_str_new_cstr(ptr);
ALLOCV_END(vstr);
return str;
}
In winevt_c, probably the above function always called with clen=-1. It may be the cause.
More info. Using C# to write integer representations of the characters led me astray. There is no 03 ETX present. Instead it's a 13 (carriage return). I still don't know why it was printed to the screen as a 3. What is still accurate is takes conversion to Unicode bytes to see the actual characters.
Here is a string snippet: found.\r\n浫牡䱫獩㹴琀∮
UTF-8 segment: found.\r\n
Unicode segment: kmarkList> t.
Bytes as UTF-8: (after index 7 is when we see the encoding changed)
| Index | Value | Type | |
|---|---|---|---|
| [0] | 102 | byte | |
| [1] | 111 | byte | |
| [2] | 117 | byte | |
| [3] | 110 | byte | |
| [4] | 100 | byte | |
| [5] | 46 | byte | |
| [6] | 13 | byte | |
| [7] | 10 | byte | |
| [8] | 230 | byte | |
| [9] | 181 | byte | |
| [10] | 171 | byte | |
| [11] | 231 | byte | |
| [12] | 137 | byte | |
| [13] | 161 | byte | |
| [14] | 228 | byte | |
| [15] | 177 | byte | |
| [16] | 171 | byte | |
| [17] | 231 | byte | |
| [18] | 141 | byte | |
| [19] | 169 | byte | |
| [20] | 227 | byte | |
| [21] | 185 | byte | |
| [22] | 180 | byte | |
| [23] | 231 | byte | |
| [24] | 144 | byte | |
| [25] | 128 | byte | |
| [26] | 226 | byte | |
| [27] | 136 | byte | |
| [28] | 174 | byte |
Bytes as Unicode. After index 15 the Unicode conversion shows the true readable values.
| Index | Value | Type | |
|---|---|---|---|
| [0] | 102 | byte | |
| [1] | 0 | byte | |
| [2] | 111 | byte | |
| [3] | 0 | byte | |
| [4] | 117 | byte | |
| [5] | 0 | byte | |
| [6] | 110 | byte | |
| [7] | 0 | byte | |
| [8] | 100 | byte | |
| [9] | 0 | byte | |
| [10] | 46 | byte | |
| [11] | 0 | byte | |
| [12] | 13 | byte | |
| [13] | 0 | byte | |
| [14] | 10 | byte | |
| [15] | 0 | byte | |
| [16] | 107 | byte | |
| [17] | 109 | byte | |
| [18] | 97 | byte | |
| [19] | 114 | byte | |
| [20] | 107 | byte | |
| [21] | 76 | byte | |
| [22] | 105 | byte | |
| [23] | 115 | byte | |
| [24] | 116 | byte | |
| [25] | 62 | byte | |
| [26] | 0 | byte | |
| [27] | 116 | byte | |
| [28] | 46 | byte | |
| [29] | 34 | byte |
@sunayk Could you try to use winevt_c master ( https://github.com/fluent-plugins-nursery/winevt_c/commit/bc89d449ab33699541e543ab3d1f1fd0b182dd7d )? This commit could fix your garbage character issue.