telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

Output plugin errors not being recorded correctly for inputs.internal plugin

Open rmelick opened this issue 1 year ago • 0 comments

Relevant telegraf.conf

[global_tags]

# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "10s"

  ## Log at debug level.
  debug = true
  ## Log only error level messages.
  quiet = false

  # cause frequent attempts to write
  metric_batch_size = 20
  # only allow for two failed
  metric_buffer_limit = 40

# Collect statistics about itself
[[inputs.internal]]
  ## If true, collect telegraf memory stats.
  collect_memstats = false

[[outputs.http]]
  ## A url that does not exist
  url = "http://localhost:9999"

[[outputs.file]]
  ## also print to stdout so we can see what stats are being collected
  files = ["stdout"]

Logs from Telegraf

2025-03-05T12:32:53Z D! [outputs.http] Buffer fullness: 40 / 40 metrics
2025-03-05T12:32:53Z E! [agent] Error writing to outputs.http: Post "http://localhost:9999": dial tcp [::1]:9999: connect: connection refused
2025-03-05T12:33:03Z W! [outputs.http] Metric buffer overflow; 6 metrics have been dropped
2025-03-05T12:33:03Z D! [outputs.file] Wrote batch of 6 metrics in 909.209µs
2025-03-05T12:33:03Z D! [outputs.file] Buffer fullness: 0 / 40 metrics
internal_write,host=58fb2e4f3721,output=http,version=1.33.3 metrics_rejected=0i,write_time_ns=2270958i,metrics_dropped=464i,errors=0i,metrics_written=0i,buffer_limit=40i,startup_errors=0i,buffer_size=40i,metrics_filtered=0i,metrics_added=504i 1741177980000000000
internal_write,host=58fb2e4f3721,output=file,version=1.33.3 buffer_limit=40i,startup_errors=0i,errors=0i,write_time_ns=190750i,metrics_dropped=0i,metrics_filtered=0i,metrics_rejected=0i,buffer_size=0i,metrics_added=504i,metrics_written=504i 1741177980000000000
internal_agent,go_version=1.24.0,host=58fb2e4f3721,version=1.33.3 gather_errors=0i,gather_timeouts=0i,metrics_written=504i,metrics_rejected=0i,metrics_dropped=464i,metrics_gathered=504i 1741177980000000000
internal_gather,host=58fb2e4f3721,input=internal,version=1.33.3 metrics_gathered=504i,gather_time_ns=646166i,gather_timeouts=0i,errors=0i 1741177980000000000
internal_write,host=58fb2e4f3721,input=internal,version=1.33.3 startup_errors=0i 1741177980000000000
internal_serializer,host=58fb2e4f3721,type=influx,version=1.33.3 metrics_serialized=2160i,bytes_serialized=415478i,serialization_time_ns=45396207i,errors=0i 1741177980000000000
2025-03-05T12:33:03Z D! [outputs.http] Buffer fullness: 40 / 40 metrics
2025-03-05T12:33:03Z E! [agent] Error writing to outputs.http: Post "http://localhost:9999": dial tcp [::1]:9999: connect: connection refused
internal_write,host=58fb2e4f3721,input=internal,version=1.33.3 startup_errors=0i 1741177990000000000
internal_serializer,host=58fb2e4f3721,type=influx,version=1.33.3 errors=0i,metrics_serialized=2186i,bytes_serialized=420375i,serialization_time_ns=46327667i 1741177990000000000
internal_write,host=58fb2e4f3721,output=http,version=1.33.3 metrics_added=510i,metrics_rejected=0i,buffer_size=40i,buffer_limit=40i,metrics_dropped=470i,startup_errors=0i,metrics_filtered=0i,errors=0i,metrics_written=0i,write_time_ns=8145791i 1741177990000000000
internal_write,host=58fb2e4f3721,output=file,version=1.33.3 buffer_limit=40i,write_time_ns=909209i,errors=0i,metrics_added=510i,metrics_written=510i,buffer_size=0i,startup_errors=0i,metrics_filtered=0i,metrics_rejected=0i,metrics_dropped=0i 1741177990000000000
internal_agent,go_version=1.24.0,host=58fb2e4f3721,version=1.33.3 metrics_gathered=510i,gather_errors=0i,gather_timeouts=0i,metrics_written=510i,metrics_rejected=0i,metrics_dropped=470i 1741177990000000000
internal_gather,host=58fb2e4f3721,input=internal,version=1.33.3 gather_time_ns=4504334i,gather_timeouts=0i,errors=0i,metrics_gathered=510i 1741177990000000000
2025-03-05T12:33:13Z D! [outputs.file] Wrote batch of 6 metrics in 558.666µs
2025-03-05T12:33:13Z D! [outputs.file] Buffer fullness: 0 / 40 metrics
2025-03-05T12:33:13Z W! [outputs.http] Metric buffer overflow; 6 metrics have been dropped
2025-03-05T12:33:13Z D! [outputs.http] Buffer fullness: 40 / 40 metrics
2025-03-05T12:33:13Z E! [agent] Error writing to outputs.http: Post "http://localhost:9999": dial tcp [::1]:9999: connect: connection refused
internal_agent,go_version=1.24.0,host=58fb2e4f3721,version=1.33.3 gather_timeouts=0i,metrics_written=516i,metrics_rejected=0i,metrics_dropped=476i,metrics_gathered=516i,gather_errors=0i 1741178000000000000
internal_gather,host=58fb2e4f3721,input=internal,version=1.33.3 gather_timeouts=0i,errors=0i,metrics_gathered=516i,gather_time_ns=657083i 1741178000000000000
internal_write,host=58fb2e4f3721,input=internal,version=1.33.3 startup_errors=0i 1741178000000000000
internal_serializer,host=58fb2e4f3721,type=influx,version=1.33.3 bytes_serialized=425479i,serialization_time_ns=46854335i,errors=0i,metrics_serialized=2212i 1741178000000000000
internal_write,host=58fb2e4f3721,output=http,version=1.33.3 metrics_dropped=476i,errors=0i,metrics_written=0i,buffer_limit=40i,startup_errors=0i,write_time_ns=4953292i,metrics_added=516i,buffer_size=40i,metrics_filtered=0i,metrics_rejected=0i 1741178000000000000
internal_write,host=58fb2e4f3721,output=file,version=1.33.3 buffer_limit=40i,metrics_written=516i,metrics_rejected=0i,metrics_added=516i,startup_errors=0i,metrics_dropped=0i,write_time_ns=558666i,errors=0i,metrics_filtered=0i,buffer_size=0i 1741178000000000000
2025-03-05T12:33:23Z W! [outputs.http] Metric buffer overflow; 6 metrics have been dropped
2025-03-05T12:33:23Z D! [outputs.file] Wrote batch of 6 metrics in 2.442583ms
2025-03-05T12:33:23Z D! [outputs.file] Buffer fullness: 0 / 40 metrics
2025-03-05T12:33:23Z D! [outputs.http] Buffer fullness: 40 / 40 metrics
2025-03-05T12:33:23Z E! [agent] Error writing to outputs.http: Post "http://localhost:9999": dial tcp [::1]:9999: connect: connection refused

System info

Telegraf 1.33.3, from the docker container running on an M1 MacBook Pro (14 inch 2021)

Docker

No response

Steps to reproduce

You can reproduce this by running a docker container that uses that config. For example,

docker run --rm --name telegraf-agent --volume $(pwd)/bug.conf:/etc/telegraf/telegraf.conf:ro telegraf:1.33.3

Expected behavior

I would expect that the errors field that corresponds to the outputs.http plugin (internal_write,output=http)would increment for each time the output plugin could not write the output.

Actual behavior

I actually observed that the errors field for the outputs.http plugin remained constantly at 0. Here is an example from the logs above

internal_write,host=58fb2e4f3721,output=http,version=1.33.3 metrics_dropped=476i,errors=0i,metrics_written=0i,buffer_limit=40i,startup_errors=0i,write_time_ns=4953292i,metrics_added=516i,buffer_size=40i,metrics_filtered=0i,metrics_rejected=0i 1741178000000000000

There were many error log messages in the telegraf logs, such as this one

2025-03-05T12:33:13Z E! [agent] Error writing to outputs.http: Post "http://localhost:9999": dial tcp [::1]:9999: connect: connection refused

I would have expected to be able to have a metric or counter I could monitor that would be incremented for each of these error messages.

Additional info

Interestingly, the docs about the inputs.internal plugin actually do not mention the errors field.

I believe this is the same issue as reported in https://github.com/influxdata/telegraf/issues/7964, which looks like it was closed by mistake without being fixed.

rmelick avatar Mar 05 '25 12:03 rmelick