telegraf [outputs.postgresql] add setting to retry connection

Relevant telegraf.conf

[[outputs.postgresql]]
connection = "host=timescaledb1.intra,host=timescaledb2.intra user=telegraf password=XXX dbname=telegraf sslmode=disable target_session_attrs=read-write"
schema = "public"
create_templates = [
    '''CREATE TABLE {{ .table }} ({{ .columns }})''',
    '''SELECT create_hypertable({{ .table|quoteLiteral }}, 'time', chunk_time_interval => INTERVAL '24 hours')''',]

[[outputs.influxdb]]
  urls = ["http://influxdb1.intra"]
  database = "telegraf"
  username = "telegraf"
  password = "secret"
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
  [outputs.influxdb.tagdrop]
    influxdb_database = ["*"]

Logs from Telegraf

Nov 29 07:37:43 collector1 telegraf[963895]: 2023-11-29T06:37:43Z E! [outputs.postgresql] PG connect failed - map[err:failed to connect to `host=timescaledb1.intra user=telegraf database=telegraf`: hostname resolving error (lookup timescaledb1.intra on 10.10.10.53:53: no such host)]
Nov 29 07:37:43 collector1 telegraf[963895]: 2023-11-29T06:37:43Z E! [outputs.postgresql] Couldn't connect to server
Nov 29 07:37:43 collector1 telegraf[963895]: failed to connect to `host=timescaledb1.intra user=telegraf database=telegraf`: hostname resolving error (lookup timescaledb1.intra on 10.10.10.53:53: no such host)
Nov 29 07:37:43 collector1 telegraf[963895]: 2023-11-29T06:37:43Z E! [agent] Failed to connect to [outputs.postgresql], retrying in 15s, error was "failed to connect to `host=timescaledb1.intra user=telegraf database=telegraf`: hostname resolving error (lookup timescaledb1.intra on 10.10.10.53:53: no such host)"

System info

Telegraf 1.28.2, Debian 11 Bullseye

Docker

No response

Steps to reproduce

Create multiple outputs (influx, timescaledb, ...)
In one output configure server name, which can't be resolved by DNS.
Restart telegraf.service and from that moment the Telegraf stops sending data to all outputs.

Expected behavior

An error when resolving the server in one output will not affect the functionality of the other outputs.

Actual behavior

For some reason, DNS failed to translate timescaledb1.intra to an IP address (see [[outputs.postgresql]]) and after restarting telegraf.service, Telegraf stopped sending data to all outputs.

Additional info

No response

Nov 29 '23 07:11 idahomst

In general, if we cannot connect an output telegraf will not start. This is the expected behavior as it prevents scenarios where a user is using a wrong password or has otherwise incorrectly configured the output connection.

We are happy to see PRs to allow per-plugin exceptions, disabled by default, where the plugin would continue to try to reconnect, usually during each write attempt. We can use this issue for the postgresql output.

Nov 29 '23 15:11 powersj

@idahomst can you please test the binary in PR #15073 available once CI finished the tests!? Using startup_error_behavior = "ignore" or startup_error_behavior = "retry" should do what you requested. Let me know if the PR fixes the issue!

Mar 27 '24 19:03 srebhan

Hi @srebhan, I tested "retry" and "ignore" too and both works great. Thank you! I look forward to finding Telegraf v1.31.0 in the Debian repositories. ;)

Mar 28 '24 11:03 idahomst

telegraf telegraf copied to clipboard

[outputs.postgresql] add setting to retry connection

Relevant telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

telegraf
telegraf copied to clipboard