fluent-plugin-windows-eventlog icon indicating copy to clipboard operation
fluent-plugin-windows-eventlog copied to clipboard

windows_eventlog2 invalid/corrupt output

Open sutyak opened this issue 4 years ago • 6 comments

Describe the bug

Possible buffer overflow? Original issue posted on the Fluentd Google Group showed there were unexpected CJK characters in event logs. Upon further investigation these are not CJK characters, but rather botched unicode bytes appended to the original text after the "end of text" character. This leads me to believe the windows_eventlog2 plugin may be reading past the desired bytes in memory and grabbing extra data.

Here is how it looks: "Description":"The resource loader failed to find MUI file. 㐳㈸‧獉畃牲湥㵴琧畲❥㸯਍⼼潂歯慭歲楌瑳>>䏐涔倀者䈼潯浫牡䱫獩㹴਍†䈼潯浫牡桃湡敮㵬洧捩潲潳瑦眭湩潤獷欭牥敮⵬湰⽰潣普杩牵瑡潩❮删捥牯䥤㵤㈧㐱✱䤠䍳牵敲瑮✽牴敵⼧ാ㰊䈯潯浫牡䱫獩㹴㸀",

To Reproduce

Configure Fluentd to read all event logs with "read_all_channels true". This does not occur on the top-level Application, System, and Security logs. Coonfigure the match to dump all output to a local json file for convenience. In the below code I had already narrowed it down to the wer-payloadhealth log, but that may not be consistent on every system, which is why I recommended using "read_all_channels true".

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  channels "microsoft-windows-wer-payloadhealth/operational"
  preserve_qualifiers_on_hash true
  read_existing_events
  read_interval 10
  tag winevt.raw2
  render_as_xml false
  rate_limit 5000
  <storage>
    @type local
    persistent true
    path "C:/Program Files/appname/Fluentd/pos/winevt2.json"
  </storage>

</source>

<match winevt.raw2>
  @type file
  path "C:/Temp/${tag}.%Y%m%d%H%M"
  path_suffix ".json"
  append true
  <format>
    @type json
  </format>
  <buffer tag,time>
    timekey 1m
    timekey_use_utc true
    timekey_wait 1m
    chunk_limit_size 500MB
	flush_thread_count 2
  </buffer>
</match>

Expected behavior

The output json file will contain numerous Description elements with what appears to be CJK text. Many if not all will be associated with the what should be an empty Description.
The corresponding Description in Windows will likely be "The Description for event ID xx .... cannot be found."

Grab a Description text from the log and run it through a converter, such as the C# below:

string originalString = "paste string here";

foreach (char c in originalString) { byte[] utf8Bytes = Encoding.Unicode.GetBytes(c.ToString()); Console.WriteLine($"{(int)c} - { Encoding.UTF8.GetString(utf8Bytes)}"); }

EDIT the 3 printed below is actually "13" (carriage return). Something that stands out is the "3", which is the "end of text" character. For now I can add a check to that in my code to signify what is valid text. You can see after it gets past "10", the line feed character, everything goes a bit wonky.

Here is a snippet of the output: Colums are Integer value - character

77 - M 85 - U 73 - I 32 - 102 - f 105 - i 108 - l 101 - e 46 - . 3 - 10 -

13363 - 34 12856 - 82 8231 - ' 29513 - Is 30019 - Cu 29298 - rr 28261 - en 15732 - t= 29735 - 't 30066 - ru 10085 - e' 15919 - /> 2573 -

Your Environment

- Fluentd version: 1.11.1 and 1.12.3
- TD Agent version: 3.8.1 and 4.1.1
- Operating system: Windows Server 2019 and Windows 10 Pro
- Kernel version:

Your Configuration

<source>
  @type windows_eventlog2
  @id windows_eventlog2
  channels "microsoft-windows-wer-payloadhealth/operational"
  preserve_qualifiers_on_hash true
  read_existing_events
  read_interval 10
  tag winevt.raw2
  render_as_xml false
  rate_limit 5000
  <storage>
    @type local
    persistent true
    path "C:/Program Files/appname/Fluentd/pos/winevt2.json"
  </storage>

</source>

<match winevt.raw2>
  @type file
  path "C:/Temp/${tag}.%Y%m%d%H%M"
  path_suffix ".json"
  append true
  <format>
    @type json
  </format>
  <buffer tag,time>
    timekey 1m
    timekey_use_utc true
    timekey_wait 1m
    chunk_limit_size 500MB
	flush_thread_count 2
  </buffer>
</match>

Your Error Log

No errors.

Additional context

No response

sutyak avatar Aug 02 '21 21:08 sutyak

It's winevt_c or fluent-plugin-windows-evnetlog's issue. I've transfered this issue to fluent-plugin-windows-eventlog (may be forwarded to winevt_c later).

ashie avatar Aug 03 '21 00:08 ashie

I guess that it may be solved by appropriate https://github.com/fluent/fluent-plugin-windows-eventlog#parameters from_encoding.

kenhys avatar Aug 03 '21 08:08 kenhys

I guess that it may be solved by appropriate https://github.com/fluent/fluent-plugin-windows-eventlog#parameters from_encoding.

I'm not sure but I don't think so. I think the tailing line is cut off by ETX (0x03) at converting to UTF-8 or converting to Ruby string.

ashie avatar Aug 03 '21 08:08 ashie

https://github.com/fluent-plugins-nursery/winevt_c/blob/19ad48ac19d2bf1bf3a8d7cf781fc1872562233c/ext/winevt/winevt_utils.cpp#L8-L20

VALUE
wstr_to_rb_str(UINT cp, const WCHAR* wstr, int clen)
{
  VALUE vstr;
  CHAR* ptr;
  int len = WideCharToMultiByte(cp, 0, wstr, clen, nullptr, 0, nullptr, nullptr);
  ptr = ALLOCV_N(CHAR, vstr, len);
  WideCharToMultiByte(cp, 0, wstr, clen, ptr, len, nullptr, nullptr);
  VALUE str = rb_utf8_str_new_cstr(ptr);
  ALLOCV_END(vstr);

  return str;
}

In winevt_c, probably the above function always called with clen=-1. It may be the cause.

ashie avatar Aug 03 '21 08:08 ashie

More info. Using C# to write integer representations of the characters led me astray. There is no 03 ETX present. Instead it's a 13 (carriage return). I still don't know why it was printed to the screen as a 3. What is still accurate is takes conversion to Unicode bytes to see the actual characters.

Here is a string snippet: found.\r\n浫牡䱫獩㹴琀∮ UTF-8 segment: found.\r\n Unicode segment: kmarkList> t.

Bytes as UTF-8: (after index 7 is when we see the encoding changed)

  Index Value Type
  [0] 102 byte
  [1] 111 byte
  [2] 117 byte
  [3] 110 byte
  [4] 100 byte
  [5] 46 byte
  [6] 13 byte
  [7] 10 byte
  [8] 230 byte
  [9] 181 byte
  [10] 171 byte
  [11] 231 byte
  [12] 137 byte
  [13] 161 byte
  [14] 228 byte
  [15] 177 byte
  [16] 171 byte
  [17] 231 byte
  [18] 141 byte
  [19] 169 byte
  [20] 227 byte
  [21] 185 byte
  [22] 180 byte
  [23] 231 byte
  [24] 144 byte
  [25] 128 byte
  [26] 226 byte
  [27] 136 byte
  [28] 174 byte

Bytes as Unicode. After index 15 the Unicode conversion shows the true readable values.

  Index Value Type
  [0] 102 byte
  [1] 0 byte
  [2] 111 byte
  [3] 0 byte
  [4] 117 byte
  [5] 0 byte
  [6] 110 byte
  [7] 0 byte
  [8] 100 byte
  [9] 0 byte
  [10] 46 byte
  [11] 0 byte
  [12] 13 byte
  [13] 0 byte
  [14] 10 byte
  [15] 0 byte
  [16] 107 byte
  [17] 109 byte
  [18] 97 byte
  [19] 114 byte
  [20] 107 byte
  [21] 76 byte
  [22] 105 byte
  [23] 115 byte
  [24] 116 byte
  [25] 62 byte
  [26] 0 byte
  [27] 116 byte
  [28] 46 byte
  [29] 34 byte

sutyak avatar Aug 03 '21 13:08 sutyak

@sunayk Could you try to use winevt_c master ( https://github.com/fluent-plugins-nursery/winevt_c/commit/bc89d449ab33699541e543ab3d1f1fd0b182dd7d )? This commit could fix your garbage character issue.

cosmo0920 avatar Sep 28 '21 08:09 cosmo0920