fluentd
fluentd copied to clipboard
Allow plugin/parser_multiline to drop lines containing invalid byte sequence in UTF-8
Is your feature request related to a problem? Please describe.
I recently found out that sometimes (very rarely) my app logs are getting corrupted. One line in app log may contain invalid UTF-8 byte sequence. In this case fluentd (I'm using td-agent) is flooding it's own log file with repeated error message:
2023-01-16 11:10:31 +0300 [error]: #0 [irm3] invalid byte sequence in UTF-8
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/parser_multiline.rb:107:in `match?'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/parser_multiline.rb:107:in `firstline?'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:664:in `block in parse_multilines'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:663:in `each'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:663:in `parse_multilines'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:607:in `call'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:607:in `receive_lines'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:1104:in `block in handle_notify'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:1140:in `with_io'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:1064:in `handle_notify'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:1006:in `block in on_notify'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:1006:in `synchronize'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:1006:in `on_notify'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:817:in `on_notify'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:447:in `construct_watcher'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:452:in `block in start_watchers'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:451:in `each_value'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:451:in `start_watchers'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:384:in `refresh_watchers'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/plugin/in_tail.rb:254:in `start'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:203:in `block in start'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:192:in `block (2 levels) in lifecycle'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:191:in `each'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:191:in `block in lifecycle'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:178:in `each'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:178:in `lifecycle'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/root_agent.rb:202:in `start'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/engine.rb:248:in `start'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/engine.rb:147:in `run'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/supervisor.rb:783:in `block in run_worker'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/supervisor.rb:1056:in `main_process'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/supervisor.rb:774:in `run_worker'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/lib/fluent/command/fluentd.rb:370:in `<top (required)>'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:83:in `require'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:83:in `require'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.15.3/bin/fluentd:15:in `<top (required)>'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/bin/fluentd:23:in `load'
2023-01-16 11:10:31 +0300 [error]: #0 C:/opt/td-agent/bin/fluentd:23:in `<main>'
This results in log file of fluentd growing to very large size (several 10s of GB) over weekend.
Describe the solution you'd like
Either write this error once and skip (drop) offending log line, or retry log line reading in controlled maner (for example with config option, once in N minutes) to prevent fluentd own log growing.
Describe alternatives you've considered
Alternative could be config option to limit fluentd log file size.
Also this existing config options
ignore_repeated_log_interval
ignore_same_log_interval
help mitigate problem a little, but log size still continues to grow (with much lesser speed).
Additional context
No response