telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

AMQP output drops metrics if RabbitMQ is unavailable

Open fabbks opened this issue 3 years ago • 2 comments

Relevant telegraf.conf

[agent]
  interval = "10s"
  round_interval = false
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"

[[outputs.amqp]]
  brokers = ["amqp://192.168.93.1:5672/influx"]
  delivery_mode = "persistent"
  exchange = "influx"
  exchange_durability = "durable"
  exchange_passive = false
  exchange_type = "direct"
  routing_key = "telegraf"
  use_batch_format = true

Logs from Telegraf

2022-08-11T14:25:18Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
2022-08-11T14:25:28Z D! [outputs.amqp] Connecting to "amqp://192.168.93.1:5672/influx"
2022-08-11T14:25:28Z D! [outputs.amqp] Error connecting to "amqp://192.168.93.1:5672/influx" - dial tcp 192.168.93.1:5672: connect: connection refused
2022-08-11T14:25:28Z D! [outputs.amqp] Wrote batch of 1 metrics in 1.485015ms
2022-08-11T14:25:28Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics

System info

Telegraf 1.23.3, Debian 11.4

Docker

No response

Steps to reproduce

  1. Start RabbitMQ and Telegraf
  2. Stop RabbitMQ
  3. Generate some metrics and have them pushed with Telegraf to RabbitMQ by the outputs.amqp plugin

Expected behavior

The metrics should be buffered as long RabbiqMQ is unreachable and there is bufferspace available.

Actual behavior

The generated metrics are being dropped from memory after one failed connection attempt. After RabbitMQ comes available again Telegraf won't try to push the metrics again.

Additional info

No response

fabbks avatar Aug 11 '22 14:08 fabbks

Hi,

I used the following docker cmd to set up a server:

docker run -it --net host --env RABBITMQ_DEFAULT_VHOST=influx rabbitmq

Produced the following log messages:

2022-08-11T19:57:08Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:08Z D! [outputs.amqp] Connected to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:08Z D! [agent] Successfully connected to outputs.amqp
2022-08-11T19:57:08Z D! [agent] Starting service inputs
2022-08-11T19:57:18Z D! [outputs.amqp] Wrote batch of 1 metrics in 136.228µs
2022-08-11T19:57:18Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
# we had one good push above this, I close the container now
2022-08-11T19:57:28Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:28Z D! [outputs.amqp] Error connecting to "amqp://127.0.0.1:5672/influx" - dial tcp 127.0.0.1:5672: connect: connection refused
2022-08-11T19:57:28Z D! [outputs.amqp] Buffer fullness: 1 / 10000 metrics
2022-08-11T19:57:28Z E! [agent] Error writing to outputs.amqp: could not connect to any broker
2022-08-11T19:57:38Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:38Z D! [outputs.amqp] Error connecting to "amqp://127.0.0.1:5672/influx" - dial tcp 127.0.0.1:5672: connect: connection refused
2022-08-11T19:57:38Z D! [outputs.amqp] Wrote batch of 2 metrics in 180.808µs
# oops, instead of keeping metrics in the buffer it writes them?
# after this I bring up the container
2022-08-11T19:57:38Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
2022-08-11T19:57:48Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:48Z D! [outputs.amqp] Connected to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:48Z D! [outputs.amqp] Wrote batch of 1 metrics in 11.612381ms
2022-08-11T19:57:48Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
2022-08-11T19:57:58Z D! [outputs.amqp] Wrote batch of 1 metrics in 98.149µs
2022-08-11T19:57:58Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics

powersj avatar Aug 11 '22 20:08 powersj

Any update on this bug? I have observed the same behavior and it seems to be caused by Write() function in amqp.go will return nil instead of the error. The if block starting at line 158 will not return the error when error is not of type amqp.ErrClosed when it attends to reconnect again but the server is still not reachable.

Not having buffer capability for down time making using Telegraf with the plugin pointless.

r-b-liu avatar Apr 22 '25 22:04 r-b-liu