fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

enable_input_metrics: inaccurate values

Open daipom opened this issue 1 year ago • 2 comments

Describe the bug

We can take metrics of input plugins by setting enable_input_metrics. However, the value would be inaccurate.

I don't directly confirm it yet, but I have confirmed @caller_plugin_id of EventRouter has race condition.

  • https://github.com/fluent/fluentd/issues/4567#issuecomment-2503380410

So, it would be possible that a wrong metric_callbacks is selected.

https://github.com/fluent/fluentd/blob/2d8c9d4b94b1d45e7d4d6e59caae5640a516eabd/lib/fluent/event_router.rb#L96-L102

It would cause wrong metrics calculation.

To Reproduce

I haven't checked it yet, but the following settings should result in a slight error in the metric values.

<system>
  enable_input_metrics
</system>

<source>
  @type monitor_agent
</source>

<source>
  @type sample
  tag test.foo
  rate 100
</source>

<source>
  @type sample
  tag test.bar
  rate 100
</source>

<source>
  @type sample
  tag test.boo
  rate 100
</source>

<match test.**>
  @type null
</match>

Wait a few minutes and check the metrics.

curl http://localhost:24220/api/plugins.json | jq
{
  "plugins": [
    {
      "plugin_id": "object:d34",
      "plugin_category": "input",
      "type": "monitor_agent",
      "config": {
        "@type": "monitor_agent"
      },
      "output_plugin": false,
      "retry_count": null,
      "emit_records": 0,
      "emit_size": 0
    },
    {
      "plugin_id": "object:d48",
      "plugin_category": "input",
      "type": "sample",
      "config": {
        "@type": "sample",
        "tag": "test.foo",
        "rate": "100"
      },
      "output_plugin": false,
      "retry_count": null,
      "emit_records": 43112,
      "emit_size": 0
    },
    {
      "plugin_id": "object:d5c",
      "plugin_category": "input",
      "type": "sample",
      "config": {
        "@type": "sample",
        "tag": "test.bar",
        "rate": "100"
      },
      "output_plugin": false,
      "retry_count": null,
      "emit_records": 43109,
      "emit_size": 0
    },
    {
      "plugin_id": "object:d70",
      "plugin_category": "input",
      "type": "sample",
      "config": {
        "@type": "sample",
        "tag": "test.boo",
        "rate": "100"
      },
      "output_plugin": false,
      "retry_count": null,
      "emit_records": 43109,
      "emit_size": 0
    },
    {
      "plugin_id": "object:d0c",
      "plugin_category": "output",
      "type": "null",
      "config": {
        "@type": "null"
      },
      "output_plugin": true,
      "retry_count": 0,
      "emit_records": 129330,
      "emit_size": 0,
      "emit_count": 129330,
      "write_count": 0,
      "rollback_count": 0,
      "slow_flush_count": 0,
      "flush_time_count": 0,
      "retry": {}
    }
  ]
}

You can confirm a difference in the value of emit_records for each in_sample.

Expected behavior

There is no difference in the value of emit_records for each in_sample.

Your Environment

- Fluentd version: 1.18.0
- Package version:
- Operating system: Ubuntu 20.04.6 LTS (Focal Fossa)
- Kernel version: 5.15.0-124-generic

Your Configuration

Noted in `To Reproduce`.

Your Error Log

No error.

Additional context

No response

daipom avatar Nov 27 '24 09:11 daipom

@daipom is it open to work

blazethunderstorm avatar Jul 04 '25 17:07 blazethunderstorm

Sure! We need to take care of performance impact as well. Adding exclusive lock may resolve this issue, but it may cause performance regression. As Fluentd, it can be concluded that a certain degree of error should be tolerated and performance should be prioritized.

daipom avatar Jul 05 '25 03:07 daipom