fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Td-agent fluentd ruby processess goes into hung state when regex pattern is provided in tail plugin and log messages contains garbage/control (e.g. record separator) characters.

Open deepaksb2 opened this issue 1 year ago • 1 comments

Describe the bug

Any log message which contains garbage/control character like “Record Separator” (which only shows in notepad++, copy paste to other editor does not show those characters, please refer screenshot)

While the same expression and sample data parse/filter the data correctly in https://fluentular.herokuapp.com/ validator correctly (please refer https://fluentular.herokuapp.com/parse?regexp=.%3F%28%3F%3Clevel%3E.%3F%29%5B+%5Ct%5D%2B%28%3F%3Ctime%3E%5Cd%7B1%2C2%7D%5C%2F%5Cd%7B1%2C2%7D%5B+%5Ct%5D%2B%5Cd%7B1%2C2%7D%3A%5Cd%7B1%2C2%7D%3A%5Cd%7B1%2C2%7D%5C.%5Cd%7B1%2C3%7D%29%5B+%5Ct%5D%2B%5C%5B%28%3F%3Cthread%3E.%29%5C%5D%5B+%5Ct%5D%2BUserAccount+Adding+user+with+TenantName+as+%28%3F%3Cusername%3E.%3F%29%5C*&input=INFO++08%2F16+19%3A31%3A39.214+%5Bhttps-jsse-nio-443-exec-3%5D+UserAccount+Adding+user+with+TenantName+as+AAAmonitoring%1E*&time_format=%25m%2F%25d++%25H%3A%25M%3A%25S.%25L).

below is sample log message screenshot

logMessagewithRecordSeparator

Sample regex: .?(?.?)[ \t]+(?

Sample data :

INFO 08/31 19:31:39.214 [https-jsse-nio-443-exec-19] UserAccount Adding user with TenantName as XYZ<here_Record separatorappears>*

To Reproduce

Install td-agent on windows and configure td-agent.conf file with tail plugin with type regex. tail the file which contains control characters like record separator. the ruby processes goes into hung state and tail plugin stops working.

Expected behavior

tail plugin should be able to parse the log messages containing control characters and continue tailing. fluentular validator able to parse the message just file, so fluentd as well.

sampleRecordSeparatorlogs.txt

Your Environment

- Fluentd version: 1.15.2
- TD Agent version:4.4.1 (msi installer td-agent-4.4.1-x64.msi)
- Operating system:Windows Server 2012 R2 and 2019

- Kernel version:

Your Configuration

####
## Output descriptions:
##
<source>
  @type tail
  path C:/logs/UsrAccount_New.log
  tag DevOpsWinDebug_logingest
  read_from_head true
  pos_file C:/opt/file-UsrLogs.log.pos
  follow_inodes true
  refresh_interval 2
  enable_stat_watcher false
  from_encoding UTF-8
  encoding UTF-8
  exclude_path [""]
  multiline_flush_interval 5s
  <parse>
    @type regexp
expression /.*?(?<level>.*?)[ \t]+(?<time>\d{1,2}\/\d{1,2}[ \t]+\d{1,2}:\d{1,2}:\d{1,2}\.\d{1,3})[ \t]+\[(?<thread>.*)\][ \t]+UserAccount Adding user with TenantName as (?<username>.*?)\*/
time_format %m/%d  %H:%M:%S.%L
time_key time
  </parse>
</source>

<match DevOpsWinDebug_logingest>
    @type stdout
</match>

## match tag=debug.** and dump to console
<match DevOpsWinDebug_logingest>
  @type stdout
  @id output_stdout
</match>

####
## Source descriptions:
##

## built-in TCP input
## @see http://docs.fluentd.org/articles/in_forward
<source>
  @type forward
  @id input_forward
</source>

## built-in UNIX socket input
#<source>
#  type unix
#</source>

# HTTP input
# POST http://localhost:8888/<tag>?json=<json>
# POST http://localhost:8888/td.myapp.login?json={"user"%3A"me"}
# @see http://docs.fluentd.org/articles/in_http
<source>
  @type http
  @id input_http
  port 8888
</source>

## live debugging agent
<source>
  @type debug_agent
  @id input_debug_agent
  bind 127.0.0.1
  port 24230
</source>

Your Error Log

No errors , ruby process just goes into hung state and forcefully kill the ruby processes to stop.

2022-09-01 01:11:01 -1200 [info]: starting fluentd-1.15.2 pid=3776 ruby="2.7.6"
2022-09-01 01:11:01 -1200 [info]: spawn command to main:  cmdline=["C:/opt/td-agent/bin/ruby.exe", "-Eascii-8bit:ascii-8bit", "C:/opt/td-agent/bin/fluentd", "--under-supervisor"]
2022-09-01 01:11:03 -1200 [info]: adding match pattern="DevOpsWinDebug_logingest" type="stdout"
2022-09-01 01:11:03 -1200 [info]: adding match pattern="DevOpsWinDebug_logingest" type="stdout"
2022-09-01 01:11:03 -1200 [info]: adding source type="tail"
2022-09-01 01:11:03 -1200 [warn]: #0 'encoding' and 'from_encoding' are same encoding. No effect
2022-09-01 01:11:03 -1200 [info]: adding source type="forward"
2022-09-01 01:11:03 -1200 [info]: adding source type="http"
2022-09-01 01:11:03 -1200 [info]: adding source type="debug_agent"
2022-09-01 01:11:03 -1200 [info]: #0 starting fluentd worker pid=3100 ppid=3776 worker=0
2022-09-01 01:11:03 -1200 [info]: #0 [input_debug_agent] listening dRuby uri="druby://127.0.0.1:24230" object="Fluent::Engine" worker=0
2022-09-01 01:11:03 -1200 [info]: #0 [input_forward] listening port port=24224 bind="0.0.0.0"
2022-09-01 01:11:03 -1200 [info]: #0 following tail of C:/logs/UsrAccount_New.log
2022-09-01 01:11:03 -1200 [warn]: #0 pattern not matched: "INFO  08/25 06:21:08.252 [https-jsse-nio-443-exec-14] UserAccount Connection Retry Count set for GET 3"
2022-09-01 01:11:03 -1200 [warn]: #0 pattern not matched: "INFO  08/25 06:21:08.534 [https-jsse-nio-443-exec-8] UserAccount Connection Retry Count set for GET 3"

Additional context

The td-agent just works fine with parse @type none but displays ASCII char of record separator -- 2022-09-01 00:25:26.854032200 -1200 DevOpsWinDebug_logingest: {"message":"INFO 08/25 10:12:55.799 [https-jsse-nio-443-exec-11] UserAccount Adding user with TenantName as monitoring**\u001e***"}

deepaksb2 avatar Sep 01 '22 14:09 deepaksb2