ansible-telegraf
ansible-telegraf copied to clipboard
Error installing on Ubuntu/Debian with a version other than latest
Describe the bug After upgrade to version 0.12 of the role all my provision failed to download telegraf from repo
Installation method/version
- Ansible Galaxy / 2.7.9
Ansible Version
ansible-playbook 2.7.9
config file = 'omit'/ansible.cfg
configured module search path = ['/home/edbizarro/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.7/site-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 3.7.3 (default, Mar 26 2019, 21:43:19) [GCC 8.2.1 20181127]
Targetted hosts Concerns the following OS(es):
- Ubuntu 18.04
Expected behavior
Additional context
My .yml
- role: dj-wasabi.telegraf
become: yes
telegraf_agent_package_method: repo
telegraf_agent_hostname: "apps"
telegraf_agent_interval: 10
telegraf_agent_aws_tags: true
telegraf_agent_quiet: true
telegraf_agent_output:
- type: influxdb
config:
- urls = ["http://omit:8086"]
- database = "telegraf"
- precision = "s"
telegraf_plugins_default:
- plugin: cpu
config:
- percpu = false
- totalcpu = true
- plugin: disk
- plugin: swap
- plugin: processes
- plugin: diskio
- plugin: mem
- plugin: net
- plugin: system
- plugin: netstat
- plugin: kernel
- plugin: docker
Output
TASK [dj-wasabi.telegraf : Debian | Install Telegraf package] ********************************************************************************************************************************************
FAILED - RETRYING: Debian | Install Telegraf package (3 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (2 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (1 retries left).
fatal: [omit]: FAILED! => {"attempts": 3, "cache_update_time": 1554319334, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'telegraf=1.10.0-1'' failed: E: Version '1.10.0-1' for 'telegraf' was not found\n", "rc": 100, "stderr": "E: Version '1.10.0-1' for 'telegraf' was not found\n", "stderr_lines": ["E: Version '1.10.0-1' for 'telegraf' was not found"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information..."]}
hm, with telegraf_agent_version: 1.10.2
everything works, but with default does not
I have experienced the same issue.
Using default telegraf_agent_version
yields the same error message, even if I set it to telegraf_agent_version: 1.10.2
, I still get failed: E: Version '1.10.2-1' for 'telegraf' was not found
.
The workaround in my case was to use instead telegraf_agent_package_state: latest
as per docs.
Would be nice however to freeze the Telegraf version.
@sk1u You are using Debian or Ubuntu as well?
It seems the apt repository containing Telegraf only contains the latest available version. With every new Telegraf version, the old one is removed from their apt repository and thus new installation fails.
I've created an issue on the Telegraf project:
https://github.com/influxdata/telegraf/issues/5685
That is right, it is a Debian 9 host.
Your explanation makes sense, thank you for that.
Just wanted to chip in, that using telegraf_agent_version
only solves half the problem, as running the role again will fail if a new version comes out. It would be awesome if there was a way to "force" an update to the latest "minor/patch" version (e.g today it's 1.10.4).
I had the previous patch version before, and when I ran the role I saw this:
TASK [dj-wasabi.telegraf : Debian | Install Telegraf package] ******************************************************************************************************************************************************************************
Tuesday 28 May 2019 19:12:15 +0200 (0:00:00.040) 0:00:20.324 ***********
FAILED - RETRYING: Debian | Install Telegraf package (3 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (2 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (1 retries left).
fatal: [my-server.example.com]: FAILED! => {"attempts": 3, "cache_update_time": 1557211497, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\" install 'telegraf'' failed: E: Failed to fetch https://repos.influxdata.com/debian/pool/stable/t/telegraf/telegraf_1.10.3-1_amd64.deb 404 Not Found\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n", "rc": 100, "stderr": "E: Failed to fetch https://repos.influxdata.com/debian/pool/stable/t/telegraf/telegraf_1.10.3-1_amd64.deb 404 Not Found\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n", "stderr_lines": ["E: Failed to fetch https://repos.influxdata.com/debian/pool/stable/t/telegraf/telegraf_1.10.3-1_amd64.deb 404 Not Found", "E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following packages will be upgraded:\n telegraf\n1 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.\nNeed to get 18.1 MB of archives.\nAfter this operation, 0 B of additional disk space will be used.\nErr:1 https://repos.influxdata.com/debian stretch/stable amd64 telegraf amd64 1.10.3-1\n 404 Not Found\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following packages will be upgraded:", " telegraf", "1 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.", "Need to get 18.1 MB of archives.", "After this operation, 0 B of additional disk space will be used.", "Err:1 https://repos.influxdata.com/debian stretch/stable amd64 telegraf amd64 1.10.3-1", " 404 Not Found"]}
Do note that I was able to run apt install telegraf
manually just fine, and this unblocked me. Maybe we should consider using some of the parameters of the apt_module
? Such as autoclean
, force_apt_get
, update_cache
, etc ... if either of these works, we can hide it behind an optional toggle for extra safety / backward-compat.
This is still happening, even on "online" installs. This entire step is contradictory (the name is "==latest", the when is "!= latest"). It always sets the name, unless 'latest' is not set... https://github.com/dj-wasabi/ansible-telegraf/blob/2687513dc89819982ec37ab795a69630c4065bb1/tasks/Debian.yml#L3-L7
When combined with this, you can only use the defaults (which are a minor version behind 1.10 vs 1.11.5) because the destination "telegraf_agent_package" needs to be a full file path. https://github.com/dj-wasabi/ansible-telegraf/blob/2687513dc89819982ec37ab795a69630c4065bb1/tasks/Debian.yml#L72-L77
When I set my version to 1.11.5 the "telegraf_agent_package" variable gets set to "telegraf=1.11.5-1" which is not a path as required by the get_url module.
These could be a typo: Maybe this should be "== latest"? https://github.com/dj-wasabi/ansible-telegraf/blob/2687513dc89819982ec37ab795a69630c4065bb1/tasks/Debian.yml#L7
Changing the conditional to match the task description fixed my problems (per the pull request I created). This pull request is a simple 1 character change which allows most installs to work, it should be an easy approval: https://github.com/dj-wasabi/ansible-telegraf/pull/108