ansible-telegraf icon indicating copy to clipboard operation
ansible-telegraf copied to clipboard

Error installing on Ubuntu/Debian with a version other than latest

Open edbizarro opened this issue 5 years ago • 8 comments

Describe the bug After upgrade to version 0.12 of the role all my provision failed to download telegraf from repo

Installation method/version

  • Ansible Galaxy / 2.7.9

Ansible Version

ansible-playbook 2.7.9
  config file = 'omit'/ansible.cfg
  configured module search path = ['/home/edbizarro/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']                                                                                     
  ansible python module location = /usr/lib/python3.7/site-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 3.7.3 (default, Mar 26 2019, 21:43:19) [GCC 8.2.1 20181127]

Targetted hosts Concerns the following OS(es):

  • Ubuntu 18.04

Expected behavior

Additional context

My .yml

   - role: dj-wasabi.telegraf
      become: yes
      telegraf_agent_package_method: repo
      telegraf_agent_hostname: "apps"
      telegraf_agent_interval: 10
      telegraf_agent_aws_tags: true
      telegraf_agent_quiet: true
      telegraf_agent_output:
        - type: influxdb
          config:
            - urls = ["http://omit:8086"]
            - database = "telegraf"
            - precision = "s"
      telegraf_plugins_default:
        - plugin: cpu
          config:
            - percpu = false
            - totalcpu = true
        - plugin: disk
        - plugin: swap
        - plugin: processes
        - plugin: diskio
        - plugin: mem
        - plugin: net
        - plugin: system
        - plugin: netstat
        - plugin: kernel
        - plugin: docker

Output

TASK [dj-wasabi.telegraf : Debian | Install Telegraf package] ********************************************************************************************************************************************
FAILED - RETRYING: Debian | Install Telegraf package (3 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (2 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (1 retries left).
fatal: [omit]: FAILED! => {"attempts": 3, "cache_update_time": 1554319334, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"     install 'telegraf=1.10.0-1'' failed: E: Version '1.10.0-1' for 'telegraf' was not found\n", "rc": 100, "stderr": "E: Version '1.10.0-1' for 'telegraf' was not found\n", "stderr_lines": ["E: Version '1.10.0-1' for 'telegraf' was not found"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information..."]}

edbizarro avatar Apr 03 '19 19:04 edbizarro

hm, with telegraf_agent_version: 1.10.2 everything works, but with default does not

edbizarro avatar Apr 03 '19 19:04 edbizarro

I have experienced the same issue. Using default telegraf_agent_version yields the same error message, even if I set it to telegraf_agent_version: 1.10.2, I still get failed: E: Version '1.10.2-1' for 'telegraf' was not found.

The workaround in my case was to use instead telegraf_agent_package_state: latest as per docs. Would be nice however to freeze the Telegraf version.

maiku1008 avatar Apr 05 '19 18:04 maiku1008

@sk1u You are using Debian or Ubuntu as well?

It seems the apt repository containing Telegraf only contains the latest available version. With every new Telegraf version, the old one is removed from their apt repository and thus new installation fails.

I've created an issue on the Telegraf project:

https://github.com/influxdata/telegraf/issues/5685

dj-wasabi avatar Apr 05 '19 19:04 dj-wasabi

That is right, it is a Debian 9 host.

Your explanation makes sense, thank you for that.

maiku1008 avatar Apr 06 '19 04:04 maiku1008

Just wanted to chip in, that using telegraf_agent_version only solves half the problem, as running the role again will fail if a new version comes out. It would be awesome if there was a way to "force" an update to the latest "minor/patch" version (e.g today it's 1.10.4).

I had the previous patch version before, and when I ran the role I saw this:

TASK [dj-wasabi.telegraf : Debian | Install Telegraf package] ******************************************************************************************************************************************************************************
Tuesday 28 May 2019  19:12:15 +0200 (0:00:00.040)       0:00:20.324 ***********
FAILED - RETRYING: Debian | Install Telegraf package (3 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (2 retries left).
FAILED - RETRYING: Debian | Install Telegraf package (1 retries left).
fatal: [my-server.example.com]: FAILED! => {"attempts": 3, "cache_update_time": 1557211497, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"     install 'telegraf'' failed: E: Failed to fetch https://repos.influxdata.com/debian/pool/stable/t/telegraf/telegraf_1.10.3-1_amd64.deb  404  Not Found\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n", "rc": 100, "stderr": "E: Failed to fetch https://repos.influxdata.com/debian/pool/stable/t/telegraf/telegraf_1.10.3-1_amd64.deb  404  Not Found\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n", "stderr_lines": ["E: Failed to fetch https://repos.influxdata.com/debian/pool/stable/t/telegraf/telegraf_1.10.3-1_amd64.deb  404  Not Found", "E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following packages will be upgraded:\n  telegraf\n1 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.\nNeed to get 18.1 MB of archives.\nAfter this operation, 0 B of additional disk space will be used.\nErr:1 https://repos.influxdata.com/debian stretch/stable amd64 telegraf amd64 1.10.3-1\n  404  Not Found\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following packages will be upgraded:", "  telegraf", "1 upgraded, 0 newly installed, 0 to remove and 24 not upgraded.", "Need to get 18.1 MB of archives.", "After this operation, 0 B of additional disk space will be used.", "Err:1 https://repos.influxdata.com/debian stretch/stable amd64 telegraf amd64 1.10.3-1", "  404  Not Found"]}

Do note that I was able to run apt install telegraf manually just fine, and this unblocked me. Maybe we should consider using some of the parameters of the apt_module ? Such as autoclean, force_apt_get, update_cache, etc ... if either of these works, we can hide it behind an optional toggle for extra safety / backward-compat.

asfaltboy avatar May 28 '19 17:05 asfaltboy

This is still happening, even on "online" installs. This entire step is contradictory (the name is "==latest", the when is "!= latest"). It always sets the name, unless 'latest' is not set... https://github.com/dj-wasabi/ansible-telegraf/blob/2687513dc89819982ec37ab795a69630c4065bb1/tasks/Debian.yml#L3-L7

When combined with this, you can only use the defaults (which are a minor version behind 1.10 vs 1.11.5) because the destination "telegraf_agent_package" needs to be a full file path. https://github.com/dj-wasabi/ansible-telegraf/blob/2687513dc89819982ec37ab795a69630c4065bb1/tasks/Debian.yml#L72-L77

matttrach avatar Sep 04 '19 22:09 matttrach

When I set my version to 1.11.5 the "telegraf_agent_package" variable gets set to "telegraf=1.11.5-1" which is not a path as required by the get_url module.

These could be a typo: Maybe this should be "== latest"? https://github.com/dj-wasabi/ansible-telegraf/blob/2687513dc89819982ec37ab795a69630c4065bb1/tasks/Debian.yml#L7

matttrach avatar Sep 04 '19 22:09 matttrach

Changing the conditional to match the task description fixed my problems (per the pull request I created). This pull request is a simple 1 character change which allows most installs to work, it should be an easy approval: https://github.com/dj-wasabi/ansible-telegraf/pull/108

matttrach avatar Sep 10 '19 03:09 matttrach