fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

log: add caller plugin-id about emitting error events

Open daipom opened this issue 1 year ago • 3 comments

Which issue(s) this PR fixes:

  • Fixes #4567

What this PR does / why we need it: Add caller plugin-id to warning logs about emitting error events.

It would be helpful if we could know what plugin emitted the error event.

We need to care about the compatibility. This signature change would not break compatibility.

However, I'm concerned that caller_plugin_id has a race condition, although I don't confirm it. It looks to me that the id can be another plugin-id running concurrently... It is not the issue with this fix, it is the issue of the existing implementation.

Docs Changes: Not needed.

Release Note: Add plugin-id to warning logs about emitting error events.

Example: From #4567.

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

<match sample.**>
  @type stdout
</match>
2024-09-11 15:26:26 +0900 [info]: #0 fluentd worker is now running worker=0
2024-09-11 15:26:27 +0900 [warn]: #0 dump an error event:
 error_class=Fluent::Plugin::Parser::ParserError
 error="pattern not matched with data 'this is not json}'"
 location=nil tag="sample.json"
 time=2024-09-11 15:26:27.040657722 +0900
 plugin_id="id_not_json" ### <= This info is added by this PR ###
 record={"data"=>"this is not json}"}
...

daipom avatar Sep 11 '24 06:09 daipom

However, I'm concerned that caller_plugin_id has a race condition, although I don't confirm it. It looks to me that the id can be another plugin-id running concurrently...

https://github.com/fluent/fluentd/blob/51b860b1be2eb59076d706fda12d55657a614c8f/lib/fluent/plugin_helper/event_emitter.rb#L27-L36

daipom avatar Sep 11 '24 06:09 daipom

As error.backtrace does not have enough information in this context, so current approach seems reasonable. (No need to force 3rdparty plugins to fix it)

kenhys avatar Sep 13 '24 01:09 kenhys

This PR has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this PR will be closed in 7 days

github-actions[bot] avatar Oct 13 '24 10:10 github-actions[bot]

A simple check seems ok. But may occur with more load.

<source>
  @type sample
  tag test.foo
</source>
<source>
  @type sample
  tag test.bar
</source>
<source>
  @type sample
  tag test.boo
</source>

<filter test.foo>
  @type parser
  @id foo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.bar>
  @type parser
  @id bar
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.boo>
  @type parser
  @id boo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>

<match test.**>
  @type stdout
</match>
2024-11-27 13:55:58 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error="not-exist does not exist" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'" tag="test.bar" time=2024-11-27 13:55:58.022326275 +0900 plugin_id="bar" record={"message"=>"sample"}
2024-11-27 13:55:58 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error="not-exist does not exist" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'" tag="test.foo" time=2024-11-27 13:55:58.023002806 +0900 plugin_id="foo" record={"message"=>"sample"}
2024-11-27 13:55:58 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error="not-exist does not exist" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'" tag="test.boo" time=2024-11-27 13:55:58.023570092 +0900 plugin_id="boo" record={"message"=>"sample"}
...

daipom avatar Nov 27 '24 05:11 daipom

I've succeeded in reproducing the race condition.

  • Run 3 in_samples with rate 100 (100 emits/s) in concurrent.
  • Make each filter plugin raise an error.
  • Check Fluentd log. Check for incorrect tag/id combinations.

Results: 0.005% mismatch confirmed.

<source>
  @type sample
  tag test.foo
  rate 100
</source>
<source>
  @type sample
  tag test.bar
  rate 100
</source>
<source>
  @type sample
  tag test.boo
  rate 100
</source>

<filter test.foo>
  @type parser
  @id foo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.bar>
  @type parser
  @id bar
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>
<filter test.boo>
  @type parser
  @id boo
  key_name not-exist
  <parse>
    @type none
  </parse>
</filter>

<match test.**>
  @type stdout
</match>

Script for check (Ruby)

#!/usr/bin/env ruby
entire_count = 0
mismatched_count = 0

p "checking files ..."

ARGF.each_line do |line|
  next unless /^.* dump an error event: .* tag="(.*)" .* plugin_id="(.*)" .*$/ =~ line
  entire_count += 1
  tag = $1
  plugin_id = $2

  unless tag.include? plugin_id
    mismatched_count += 1
    p "mismatched! tag: #{tag}, plugin_id: #{plugin_id}, raw_log: #{line}"
  end
end

p "done"
p "entire_count: #{entire_count}"
p "mismatched_count: #{mismatched_count}"
p "mismatched_percentage: #{100.0 * mismatched_count / entire_count}%"
$ bundle exec fluentd -c test.conf --log log; ./check.rb < log; rm -f log
(Wait a moment and press Ctrl+C)
^C
"checking files ..."
"mismatched! tag: test.boo, plugin_id: bar, raw_log: 2024-11-27 18:14:22 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.boo\" time=2024-11-27 18:14:22.793045711 +0900 plugin_id=\"bar\" record={\"message\"=>\"sample\"}\n"
"mismatched! tag: test.bar, plugin_id: foo, raw_log: 2024-11-27 18:15:56 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.bar\" time=2024-11-27 18:15:56.735563552 +0900 plugin_id=\"foo\" record={\"message\"=>\"sample\"}\n"
"mismatched! tag: test.bar, plugin_id: foo, raw_log: 2024-11-27 18:17:02 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.bar\" time=2024-11-27 18:17:02.429503164 +0900 plugin_id=\"foo\" record={\"message\"=>\"sample\"}\n"
"mismatched! tag: test.bar, plugin_id: foo, raw_log: 2024-11-27 18:17:24 +0900 [warn]: #0 dump an error event: error_class=ArgumentError error=\"not-exist does not exist\" location=\"/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:66:in `block in filter_stream'\" tag=\"test.bar\" time=2024-11-27 18:17:24.963402900 +0900 plugin_id=\"foo\" record={\"message\"=>\"sample\"}\n"
"done"
"entire_count: 73260"
"mismatched_count: 4"
"mismatched_percentage: 0.00546000546000546%"

daipom avatar Nov 27 '24 09:11 daipom

However, I'm concerned that caller_plugin_id has a race condition, although I don't confirm it. It looks to me that the id can be another plugin-id running concurrently... It is not the issue with this fix, it is the issue of the existing implementation.

This issue was confirmed. We need to fix this race condition first. So, I close this.

daipom avatar Nov 27 '24 09:11 daipom