fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Error event logs should include plugin id

Open DavidKnacker opened this issue 1 year ago • 20 comments

Describe the bug

The @id plugin parameter is used to give each plugin its unique id. This id should be included in logs generated by said plugin. This seems to work inconsistent between different plugin types. I tested the json filter plugin (for which the id is not shown) and the loki output plugin (which shows it)

working example:

<source>
  @type sample
  @id id_sample_too_long
  sample {"data":"this log is way to long to fit into one chunk"}
  tag sample.long
</source>

<match **>
  @type loki
  @id id_too_long_message
  url ...
  ...
<buffer>
  @type file
  chunk_limit_size 5
  ...
</buffer>
</match>

results in logs:

2024-07-26 13:28:37 +0000 [warn]: #0 [id_too_long_message] chunk bytes limit exceeds for an emitted event stream: 64bytes
2024-07-26 13:28:37.073654307 +0000 fluent.warn: {"message":"[id_too_long_message] chunk bytes limit exceeds for an emitted event stream: 64bytes"}

Both contain the "id_too_long_message" id of the plugin the error occured in.

But for the filter plugin:

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

The error:

2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

does not contain the id "id_not_json" of the filter plugin that caused the error.

To Reproduce

The config:

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

results in:

2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

Expected behavior

The @id fiel, if set, shows up in logs prodcued by the plugin. In this example I would expect:

2024-07-26 13:21:57 +0000 [warn]: #0 [id_not_json] dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"# [id_not_json] <Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

Your Environment

- Fluentd version: 1.17.0
- Package version:
- Operating system: Garden Linux 1443.9
- Kernel version: 6.6.35-cloud-arm64

Your Configuration

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>


### Your Error Log

```shell
2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}


### Additional context

_No response_

DavidKnacker avatar Jul 26 '24 13:07 DavidKnacker

@DavidKnacker is the id missing even for logs generated by input plugin?

Athishpranav2003 avatar Jul 30 '24 14:07 Athishpranav2003

@Athishpranav2003 I haven't checked it for error logs thrown by an input plugin yet. But am I correct to assume that using @id to track errors should work for all kind of plugins? (or at least the ones that are shipped by default with fluentd?)

DavidKnacker avatar Jul 30 '24 14:07 DavidKnacker

Ideally yes but since it's not behaving in that way I was checking what's the difference in underlying code Will check it meanwhile

Just ping me later with the case of input plugin

Athishpranav2003 avatar Jul 30 '24 14:07 Athishpranav2003

👍 will check the behavior for the input plugin tomorrow and report back

DavidKnacker avatar Jul 30 '24 15:07 DavidKnacker

another scope of Idea: track plugin which emit error event in router side, then emit more user friendly error message.

kenhys avatar Jul 31 '24 01:07 kenhys

Thanks for your report. This is Fluentd's specification.

This dump an error event log is generated by RootAgent (Fluentd's common logic). The filter_parser plugin just emits the error event to the router. So, this log does not have its id.

https://github.com/fluent/fluentd/blob/e763c0761c44d9734b6aa374371387a2e8406522/lib/fluent/root_agent.rb#L344-L354

Although this is the current specification, it looks like not so good. Maybe it could be improved somehow.

daipom avatar Jul 31 '24 01:07 daipom

Although this is the current specification, it looks like not so good. Maybe it could be improved somehow.

It would be better to include the id of the plugin that emits the error event.

It may be easy to manually include the id of filter_parser in the error info that it emits.

https://github.com/fluent/fluentd/blob/e763c0761c44d9734b6aa374371387a2e8406522/lib/fluent/plugin/filter_parser.rb#L101-L103

However, it is not so good because every plugin that emits error events needs to consider this issue.

As @kenhys says in https://github.com/fluent/fluentd/issues/4567#issuecomment-2259468182, it would be great if we could create a common mechanism for this.

another scope of Idea: track plugin which emit error event in router side, then emit more user friendly error message.

daipom avatar Jul 31 '24 02:07 daipom

@daipom so basically we need to avoid change in function signature but capture the caller plugin id?

Athishpranav2003 avatar Jul 31 '24 03:07 Athishpranav2003

so basically we need to avoid change in function signature but capture the caller plugin id?

Yes. We can't change the signature of emit_error_event() because it is related to Event Emitter plugin helper API. We need to keep compatibility for the plugins that use this function.

daipom avatar Jul 31 '24 06:07 daipom

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

Athishpranav2003 avatar Jul 31 '24 09:07 Athishpranav2003

@Athishpranav2003 to come back to your question, I do see the @id field for logs generated by the input plugin.

DavidKnacker avatar Aug 02 '24 12:08 DavidKnacker

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

@daipom what about this?

Athishpranav2003 avatar Aug 05 '24 09:08 Athishpranav2003

Are there any updates on this?

DavidKnacker avatar Sep 09 '24 12:09 DavidKnacker

Sorry for my late response.

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

Filter plugin should use emit_error_event(). It outputs events as @ERROR label. It is intended to rescue collected data that could not be processed.

On the other hand, the events from Fluentd logger mean logs of Fluentd, not collected data by input plugins. Fluentd logger outputs events as @FLUENT_LOG label.

Using both would be a possible solution. However, it would be better if it could be resolved through a common mechanism.

daipom avatar Sep 11 '24 02:09 daipom

I made a PR for this issue.

  • #4632

However, I'm concerned about some race conditions of the related existing logic...

daipom avatar Sep 11 '24 06:09 daipom

I cannot say anything about potential race conditions, but I just tested the PR in my cluster and it seems to exactly achieve what I hoped for. Thanks @daipom! :) Would be great to have this in.

DavidKnacker avatar Sep 17 '24 13:09 DavidKnacker

@DavidKnacker Thanks! I will check the race condition!

daipom avatar Sep 18 '24 02:09 daipom

Hey @daipom, are there any updates on the ticket?

DavidKnacker avatar Nov 06 '24 09:11 DavidKnacker

Sorry, we need to consider the race condition issue more. I wish we could get this into 1.18.0, but I am not sure...

daipom avatar Nov 13 '24 01:11 daipom

I confirmed the race condition issue.

  • https://github.com/fluent/fluentd/pull/4632#issuecomment-2503344686

We need to fix this race condition first.

daipom avatar Nov 27 '24 09:11 daipom