AMQP output drops metrics if RabbitMQ is unavailable
Relevant telegraf.conf
[agent]
interval = "10s"
round_interval = false
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
[[outputs.amqp]]
brokers = ["amqp://192.168.93.1:5672/influx"]
delivery_mode = "persistent"
exchange = "influx"
exchange_durability = "durable"
exchange_passive = false
exchange_type = "direct"
routing_key = "telegraf"
use_batch_format = true
Logs from Telegraf
2022-08-11T14:25:18Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
2022-08-11T14:25:28Z D! [outputs.amqp] Connecting to "amqp://192.168.93.1:5672/influx"
2022-08-11T14:25:28Z D! [outputs.amqp] Error connecting to "amqp://192.168.93.1:5672/influx" - dial tcp 192.168.93.1:5672: connect: connection refused
2022-08-11T14:25:28Z D! [outputs.amqp] Wrote batch of 1 metrics in 1.485015ms
2022-08-11T14:25:28Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
System info
Telegraf 1.23.3, Debian 11.4
Docker
No response
Steps to reproduce
- Start RabbitMQ and Telegraf
- Stop RabbitMQ
- Generate some metrics and have them pushed with Telegraf to RabbitMQ by the outputs.amqp plugin
Expected behavior
The metrics should be buffered as long RabbiqMQ is unreachable and there is bufferspace available.
Actual behavior
The generated metrics are being dropped from memory after one failed connection attempt. After RabbitMQ comes available again Telegraf won't try to push the metrics again.
Additional info
No response
Hi,
I used the following docker cmd to set up a server:
docker run -it --net host --env RABBITMQ_DEFAULT_VHOST=influx rabbitmq
Produced the following log messages:
2022-08-11T19:57:08Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:08Z D! [outputs.amqp] Connected to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:08Z D! [agent] Successfully connected to outputs.amqp
2022-08-11T19:57:08Z D! [agent] Starting service inputs
2022-08-11T19:57:18Z D! [outputs.amqp] Wrote batch of 1 metrics in 136.228µs
2022-08-11T19:57:18Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
# we had one good push above this, I close the container now
2022-08-11T19:57:28Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:28Z D! [outputs.amqp] Error connecting to "amqp://127.0.0.1:5672/influx" - dial tcp 127.0.0.1:5672: connect: connection refused
2022-08-11T19:57:28Z D! [outputs.amqp] Buffer fullness: 1 / 10000 metrics
2022-08-11T19:57:28Z E! [agent] Error writing to outputs.amqp: could not connect to any broker
2022-08-11T19:57:38Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:38Z D! [outputs.amqp] Error connecting to "amqp://127.0.0.1:5672/influx" - dial tcp 127.0.0.1:5672: connect: connection refused
2022-08-11T19:57:38Z D! [outputs.amqp] Wrote batch of 2 metrics in 180.808µs
# oops, instead of keeping metrics in the buffer it writes them?
# after this I bring up the container
2022-08-11T19:57:38Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
2022-08-11T19:57:48Z D! [outputs.amqp] Connecting to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:48Z D! [outputs.amqp] Connected to "amqp://127.0.0.1:5672/influx"
2022-08-11T19:57:48Z D! [outputs.amqp] Wrote batch of 1 metrics in 11.612381ms
2022-08-11T19:57:48Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
2022-08-11T19:57:58Z D! [outputs.amqp] Wrote batch of 1 metrics in 98.149µs
2022-08-11T19:57:58Z D! [outputs.amqp] Buffer fullness: 0 / 10000 metrics
Any update on this bug? I have observed the same behavior and it seems to be caused by Write() function in amqp.go will return nil instead of the error. The if block starting at line 158 will not return the error when error is not of type amqp.ErrClosed when it attends to reconnect again but the server is still not reachable.
Not having buffer capability for down time making using Telegraf with the plugin pointless.