Disk Buffer Strategy configuration not functioning/being ignored
Relevant telegraf.conf
# example configuration
[agent]
interval = "2s"
round_interval = true
metric_batch_size = 500
metric_buffer_limit = 3000
flush_interval = "5s"
precision = ""
debug = true
quiet = false
logtarget = "file"
logfile = "/var/log/telegraf/telegraf.log"
logfile_rotation_interval = "0d"
logfile_rotation_max_size = "1MB"
logfile_rotation_max_archives = 5
log_with_timezone = "local"
hostname = ""
omit_hostname = false
buffer_strategy = "disk"
buffer_directory = "/buffer_storage"
[[outputs.http]]
name_override = "my_output"
alias = "my_alias"
url = "http://127.0.0.1:8080/telegraf" #point at an invalid URL to force disk buffering
#buffer_strategy = "disk"
#buffer_directory = "/buffer_storage"
# collect some data to buffer
[[inputs.disk]]
[[inputs.cpu]]
Logs from Telegraf
### WITH BUFFER_STRATEGY AND BUFFER_DIRECTORY SET AT AGENT LEVEL ONLY ###
2024-07-25T00:09:18+10:00 I! Starting Telegraf 1.32.0-5cb142e6 brought to you by InfluxData the makers of InfluxDB
2024-07-25T00:09:18+10:00 I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 61 outputs, 6 secret-stores
2024-07-25T00:09:18+10:00 I! Loaded inputs: cpu disk
2024-07-25T00:09:18+10:00 I! Loaded aggregators:
2024-07-25T00:09:18+10:00 I! Loaded processors:
2024-07-25T00:09:18+10:00 I! Loaded secretstores:
2024-07-25T00:09:18+10:00 I! Loaded outputs: http
2024-07-25T00:09:18+10:00 I! Tags enabled: host=localhost.localdomain
2024-07-25T00:09:18+10:00 I! [agent] Config: Interval:2s, Quiet:false, Hostname:"localhost.localdomain", Flush Interval:5s
2024-07-25T00:09:18+10:00 D! [agent] Initializing plugins
2024-07-25T00:09:18+10:00 D! [agent] Connecting outputs
2024-07-25T00:09:18+10:00 D! [agent] Attempting connection to [outputs.http::my_alias]
2024-07-25T00:09:18+10:00 D! [agent] Successfully connected to outputs.http::my_alias
2024-07-25T00:09:18+10:00 D! [agent] Starting service inputs
2024-07-25T00:09:23+10:00 D! [outputs.http::my_alias] Buffer fullness: 17 / 3000 metrics
2024-07-25T00:09:23+10:00 E! [agent] Error writing to outputs.http::my_alias: Post "http://127.0.0.1:8080/telegraf": dial tcp 127.0.0.1:8080: connect: connection refused
2024-07-25T00:09:28+10:00 D! [outputs.http::my_alias] Buffer fullness: 47 / 3000 metrics
2024-07-25T00:09:28+10:00 E! [agent] Error writing to outputs.http::my_alias: Post "http://127.0.0.1:8080/telegraf": dial tcp 127.0.0.1:8080: connect: connection refused
### WITH BUFFER_STRATEGY AND BUFFER_DIRECTORY SET AT OUTPUT LEVEL ###
Jul 25 01:24:43 localhost telegraf[72617]: 2024-07-24T15:24:43Z I! Loading config: /etc/telegraf/telegraf.conf
Jul 25 01:24:43 localhost telegraf[72617]: 2024-07-24T15:24:43Z W! Using disk buffer strategy for plugin outputs.http, this is an experimental feature
Jul 25 01:24:43 localhost telegraf[72617]: 2024-07-24T15:24:43Z E! error loading config file /etc/telegraf/telegraf.conf: plugin outputs.http: line 22: configuration specified the fields ["buffer_directory" "buffer_strategy"], but they were not used. This is either a typo or this config option does not exist in this version.
System info
1.32.0-5cb142e6, CentOS Linux release 7.9.2009 (Core)
Docker
No response
Steps to reproduce
- Start telegraf with the above configuration
- ls -l /buffer_storage
- nothing is written to disk, metrics still stored in memory buffer. ...
Expected behavior
The measurements should be buffered into /buffer_storage/http/
Actual behavior
/buffer_storage folder is empty regardless of where the buffer_strategy or buffer_directory config items (agent or output level) are set. Telegraf is still using the 'memory' buffer even though 'disk' has been selected.
Additional info
https://github.com/influxdata/telegraf/blob/5cb142e67631a2dae1cdb30e6909da92e17d1431/models/running_output.go#L102
The above line is passing in config.BufferStrategy and config.BufferDirectory. Those 2 values will always yield an empty string ("") because if you set buffer_strategy and buffer_directory at the Output level (not Agent), Telegraf will error out and not start. I believe because the 2 options are missing from the below which causes the problem:
https://github.com/influxdata/telegraf/blob/5cb142e67631a2dae1cdb30e6909da92e17d1431/config/config.go#L1548-L1562
When buffer_directory and buffer_strategy are set at the Agent config level (not Output level), Telegraf will start, but the values are not actually used, because the code is only looking at the values set at the 'Output' level (config.BufferStrategy and config.BufferDirectory). So it keeps defaulting to memory buffer regardless of values set at the Agent level, as config.BufferStrategy and config.BufferDirectory are always an empty string when not set.
https://github.com/influxdata/telegraf/blob/5cb142e67631a2dae1cdb30e6909da92e17d1431/config/config.go#L1229
The above calls NewRunningOutput() with both c.Agent.MetricBatchSize and c.Agent.MetricBufferLimit (Agent config). These then get passed into the same NewBuffer() func. The bufferLimit and batchSize are then 'changed' based on whether they were overridden at the Output config section or not.
https://github.com/influxdata/telegraf/blob/5cb142e67631a2dae1cdb30e6909da92e17d1431/models/running_output.go#L71-L102
- I feel we should be passing in the buffer strategy/path in a similar way as metric batch size and buffer limit (above)? As args to NewRunningOutput() -> check if its been overriden at 'Output' level etc -> pass value to NewBuffer()?
- add buffer_strategy and buffer_directory to the 'ignored list' in missingTomlField() so that you can override it at the 'Output' level and stop telegraf from crashing when you set it at the 'Output' level.
- Happy to do the work just want to make sure this is what you are intending with the experimental feature <:)